business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
338 stars 60 forks source link

Bug:: Inconsistency with tssibble #53

Closed edgBR closed 2 years ago

edgBR commented 4 years ago

Dear colleagues,

I had a code that it was working as follows:

 #we set time tible index for anomaly detection
          GlobalDemand <- as_tbl_time(x = GlobalDemand, index = snsr_ts)
          GlobalDemandCleaned <- GlobalDemand %>% 
            time_decompose(target = global_demand, method = "twitter") %>%
            anomalize(target = remainder, method = "gesd", alpha=0.2, max_anoms = 0.2) %>%
            clean_anomalies() %>% 
            rename(snsr_ts = snsr_dt)

This code was using tibbletime and it was detecting my sub-hourly frequency (30mins) properly. Based on the status of tibble time I decided to migrate the time to tsibble and now I am doing as follows:

GlobalDemand <- as_tsibble(x = aggregated_data_df, index = snsr_ts, regular = TRUE)

#we set time tible index for anomaly detection

GlobalDemandCleaned <- GlobalDemand %>% 
  time_decompose(target = global_demand ,method = "twitter") %>%
  anomalize(target = remainder, method = "gesd", alpha=0.2, max_anoms = 0.2) %>%
  clean_anomalies() %>% 
  rename(snsr_ts = snsr_dt)

However it seems that anomalize ignores the index of tsibble and sets snsr_dt as index:

GlobalDemand %>%  time_decompose(target = global_demand, method = "twitter") 
Converting from tbl_ts to tbl_time. 
Auto-index message: index = snsr_dt 
Error: Problem with `mutate()` input `snsr_dt`. x 
Only year, quarter, month, week, and day periods are allowed for an index of class Date
 ℹ Input `snsr_dt` is `collapse_index(...)`. 
Run `rlang::last_error()` to see where the error occurred.

Last error and trace as follows:

rlang::last_error()
<error/dplyr_error>
Problem with `mutate()` input `snsr_dt`.
x Only year, quarter, month, week, and day periods are allowed for an index of class Date
ℹ Input `snsr_dt` is `collapse_index(...)`.
Backtrace:
  9. anomalize::time_decompose(., target = global_demand, method = "twitter")
 12. anomalize:::time_decompose.tbl_time(...)
 11. anomalize::decompose_twitter(...)
 22. anomalize::time_frequency(data, period = frequency, message = message)
 12. tibbletime::collapse_by(., period = periodicity_target)
 36. dplyr:::mutate.data.frame(...)
 37. dplyr:::mutate_cols(.data, ...)
Run `rlang::last_trace()` to see the full context.

rlang::last_trace() 
<error/dplyr_error> 
Problem with `mutate()` input `snsr_dt`. x 
Only year, quarter, month, week, and day periods are allowed for an index of class Date 
ℹ Input `snsr_dt` is `collapse_index(...)`. 
Backtrace:      
█   1. └─GlobalDemand %>% time_decompose(target = global_demand, method = "twitter")   
2.   ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))  
3.   └─base::eval(quote(`_fseq`(`_lhs`)), env, env)   
4.     └─base::eval(quote(`_fseq`(`_lhs`)), env, env)   
5.       └─`_fseq`(`_lhs`)   
6.         └─magrittr::freduce(value, `_function_list`)   
7.           ├─base::withVisible(function_list[[k]](value))   
8.           └─function_list[[k]](value)   
9.             ├─anomalize::time_decompose(., target = global_demand, method = "twitter")  
10.             └─anomalize:::time_decompose.tbl_df(...)  
11.               ├─anomalize::time_decompose(...)  
12.               └─anomalize:::time_decompose.tbl_time(...)  
13.                 └─`%>%`(...)  
14.                   ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))  
15.                   └─base::eval(quote(`_fseq`(`_lhs`)), env, env)  
16.                     └─base::eval(quote(`_fseq`(`_lhs`)), env, env)  
17.                       └─anomalize:::`_fseq`(`_lhs`)  
18.                         └─magrittr::freduce(value, `_function_list`)  
19.                           ├─base::withVisible(function_list[[k]](value))  
20.                           └─function_list[[k]](value)  
21.                             └─anomalize::decompose_twitter(...)  
22.                               └─anomalize::time_frequency(data, period = frequency, message = message)  
23.                                 └─`%>%`(...)  
24.                                   ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))  
25.                                   └─base::eval(quote(`_fseq`(`_lhs`)), env, env)  
26.                                     └─base::eval(quote(`_fseq`(`_lhs`)), env, env)  
27.                                       └─anomalize:::`_fseq`(`_lhs`)  
28.                                         └─magrittr::freduce(value, `_function_list`)  
29.                                           └─function_list[[i]](value)  
30.                                             └─tibbletime::collapse_by(., period = periodicity_target)  
31.                                               ├─dplyr::mutate(...)  
32.                                               ├─tibbletime:::mutate.tbl_time(...)  
33.                                               │ ├─tibbletime::reconstruct(NextMethod(), copy_.data)  
34.                                               │ └─tibbletime:::reconstruct.tbl_time(NextMethod(), copy_.data)  
35.                                               ├─base::NextMethod()  
36.                                               └─dplyr:::mutate.data.frame(...)  
37.                                                 └─dplyr:::mutate_cols(.data, ...) 
<error/assertError> Only year, quarter, month, week, and day periods are allowed for an index of class Date
edgBR commented 4 years ago

Hi,

Is this package maintained?

BR /Edgar

mdancho84 commented 4 years ago

@edgBR ,

Please be patient. Yes - this project is actively being maintained. My last commit was 27-days ago, which is relatively recent.

Regarding your tsibble question, try converting to a tibble first - then use anomalize. I don't know what is required to make both work together. It's something that you can easily fix though by first converting to a tibble.

-Matt

edgBR commented 4 years ago

Dear @mdancho84 well I thought that as you mention tsibble in your tibbletime repo this was the way to go.

Anyway even if I have the data as tibble the library automatically detects snsr_dt as index when in tries to auto-converted to tbl_time:


> GlobalDemand <- as_tibble(x = GlobalDemand, index = snsr_ts)
> GlobalDemandCleaned <- GlobalDemand %>%
+     time_decompose(target = global_demand, method = "twitter", ) %>%
+     anomalize(target = remainder, method = "gesd", alpha = 0.05, max_anoms = 0.1) %>%
+     clean_anomalies() %>%
+     rename(snsr_ts = snsr_dt)
Converting from tbl_df to tbl_time.
Auto-index message: index = snsr_dt

I just would like to avoid to use tibbletime if this is going to be deprecated.

BR /E

mdancho84 commented 4 years ago

tibbletime has not been deprecated, just retired. This has a different meaning in lifecycle. We are still maintaining and supporting, just not building new features. As such, anomalize, which leverages tibbletime, would require a re-write to get it to work with tsibble and tibble and tibbletime. The solution is timetk discussed below.

Timetk Anomaly Detection

The timetk package may be of interest since it uses just the tibble (not tsibble), which prevents issues like what you are experiencing. timetk has plot_anomaly_diagnostics() and tk_anomaly_diagnostics().

image