anthonygtellez / analytics_toolkit

Toolkit for Machine Learning & Analytics Use Cases.
13 stars 1 forks source link

New Metrics to Create #1

Open anthonygtellez opened 6 years ago

anthonygtellez commented 6 years ago
anthonygtellez commented 6 years ago
Keyword Description
CLM two-sided confidence limit for the mean
CSS corrected sum of squares
CV coefficient of variation
KURTOSIS kurtosis
LCLM one-sided confidence limit below the mean
MAX maximum value
MEAN average
MIN minimum value
N number of observations with nonmissing values
NMISS number of observations with missing values
RANGE range
SKEWNESS skewness
STDDEV/STD standard deviation
STDERR standard error of the mean
SUM sum
SUMWGT sum of the weight variable values
UCLM one-sided confidence limit above the mean
USS uncorrected sum of squares
VAR variance
anthonygtellez commented 6 years ago
Keyword Description
MEDIAN/P50 median or 50th percentile
P1 1st percentile
P5 5th percentile
P10 10th percentile
Q1/P25 lower quartile or 25th percentile
Q3/P75 upper quartile or 75th percentile
P90 90th percentile
P95 95th percentile
P99 99th percentile
QRANGE difference between upper and lower quartiles: Q3-Q1
MacbethX commented 6 years ago

Remember that if we are storing ALL of these values per entity per time slice we are going to bloat our summary indexes hugely :(

I recommend we create different summary index "templates" using these "macros" - so we don't fall into sistats traps where the summary indexes are HUGE

anthonygtellez commented 6 years ago

Would be nice to have a macro that calculates the delta between kmeans distortion points for finding the knee and saves it into a field.

| kmeans k=2-10 ... | eval delta = something |
anthonygtellez commented 6 years ago

Add documentation on the following apps/commands