business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
338 stars 60 forks source link

Fixing GESD algorithm fail on low variance #29

Closed beansrowning closed 5 years ago

beansrowning commented 5 years ago

History

As seen in @TsvetaKoleva's issue (#12), the GESD algorithm implementation will fail on low-variance data.

This was particularly the case in zero-inflated count data, which was not flagged as "low-variance" or "near zero variance" by caret's nearZeroVar() or similar

What was done:

previous

z <- abs(x_new - median(x_new))/mad(x_new) # Z-scores

new

z <- abs(x_new - median(x_new)) / (mad(x_new) + .Machine$double.eps) # Z-scores
mdancho84 commented 5 years ago

Thanks Sean. Just merged!

Goutamravi commented 3 years ago

Sean,

I am using this anomalize package and still having the below error when i use gesd method. however for the same data if i use iqr i do not get the below error. Any thoughts.

Error in if (any(vals_tbl$outlier == "No")) { : missing value where TRUE/FALSE needed

beansrowning commented 3 years ago

@Goutamravi Do you have a sample of data to reproduce? I might not have immediate capacity to look into it, but that might help elucidate what's going on.

Goutamravi commented 3 years ago

@beansrowning Thank you for your response. Unfortunately i am running through loops and i cannot share the entire dataset. The values are smaller. It is everyday sales data. I have values ranging from 0-1000.

Error in if (any(vals_tbl$outlier == "No")) { : missing value where TRUE/FALSE needed 11. anomalize::gesd(x = x, alpha = alpha, max_anoms = max_anoms, verbose = TRUE) 10. anomalize.tbl_df(., remainder, alpha = 0.05, method = "gesd") 9. anomalize(., remainder, alpha = 0.05, method = "gesd") 8. function_list[k] 7. withVisible(function_list[k]) 6. freduce(value, _function_list) 5. _fseq(_lhs) 4. eval(quote(_fseq(_lhs)), env, env) 3. eval(quote(_fseq(_lhs)), env, env) 2. withVisible(eval(quote(_fseq(_lhs)), env, env)) 1. model_pre_proc %>% time_decompose(Volume_Agg, method = "stl") %>% anomalize(remainder, alpha = 0.05, method = "gesd")

Goutamravi commented 3 years ago

@beansrowning I found the issue. The model was executing for 0 values. I removed Dimensional filed which have all zero values and it is executing fine. Thanks 👍