business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
338 stars 60 forks source link

Anomalize a grouped tsibble #44

Open spsanderson opened 4 years ago

spsanderson commented 4 years ago

I am using anomalize with a tsibble object. Since the data is grouped using the function index_by from tsibble anomalize() cannot work. This is due to an unsupported indexClass of type yearmonth.

I bring in data that is daily with no gaps

Here is my code:

> # Lib Load ####
> install.load::install_load(
+   "tidyquant"
+   , "fable"
+   , "fabletools"
+   , "feasts"
+   , "tsibble"
+   , "timetk"
+   , "sweep"
+   , "anomalize"
+   , "xts"
+   # , "fpp"
+   # , "forecast"
+   , "lubridate"
+   , "dplyr"
+   , "urca"
+   # , "prophet"
+   , "ggplot2"
+ )
> # Get File ####
> fileToLoad <- file.choose(new = TRUE)
> arrivals <- read.csv(fileToLoad)
> View(arrivals)
> arrivals$Time <- mdy(arrivals$Time)
> # Coerce to tsibble ----
> df_tsbl <- arrivals %>%
+   as_tsibble(index = Time)
> df_tsbl
# A tsibble: 6,908 x 2 [1D]
   Time       DSCH_COUNT
   <date>          <int>
 1 2001-01-01         22
 2 2001-01-02         30
 3 2001-01-03         43
 4 2001-01-04         30
 5 2001-01-05         38
 6 2001-01-06         22
 7 2001-01-07         29
 8 2001-01-08         37
 9 2001-01-09         33
10 2001-01-10         52
# ... with 6,898 more rows
> interval(df_tsbl)
1D
> count_gaps(df_tsbl)
# A tibble: 0 x 3
# ... with 3 variables: .from <date>, .to <date>, .n <int>
> # Make Monthly ----
> df_monthly_tsbl <- df_tsbl %>%
+   index_by(Year_Month = ~ yearmonth(.)) %>%
+   summarise(Count = sum(DSCH_COUNT, na.rm = TRUE))
> df_monthly_tsbl           
# A tsibble: 227 x 2 [1M]
   Year_Month Count
        <mth> <int>
 1   2001 Jan  1067
 2   2001 Feb   919
 3   2001 Mar  1024
 4   2001 Apr  1010
 5   2001 May  1056
 6   2001 Jun   995
 7   2001 Jul  1002
 8   2001 Aug  1076
 9   2001 Sep   982
10   2001 Oct   971
# ... with 217 more rows

> # Anomalize ----
> df_monthly_tsbl %>%
+   time_decompose(Count, method = "twitter") %>%
+   anomalize(remainder, method = "gesd") %>%
+   clean_anomalies() %>%
+   time_recompose()
Converting from tbl_ts to tbl_time.
Auto-index message: index = Year_Month
Error in index.xts(x) : unsupported ‘indexClass’ indexing type: yearmonth

Session info:

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_3.2.1              urca_1.3-0                 dplyr_0.8.3               
 [4] anomalize_0.2.0            sweep_0.2.2                timetk_0.1.2              
 [7] tsibble_0.8.5              feasts_0.1.1               fable_0.1.1               
[10] fabletools_0.1.1           tidyquant_0.5.9            quantmod_0.4-15           
[13] TTR_0.23-6                 PerformanceAnalytics_1.5.3 xts_0.11-2                
[16] zoo_1.8-6                  lubridate_1.7.4           

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5   purrr_0.3.3        lattice_0.20-38    colorspace_1.4-1  
 [5] vctrs_0.2.1        generics_0.0.2     utf8_1.1.4         rlang_0.4.2       
 [9] pillar_1.4.3       tibbletime_0.1.3   glue_1.3.1         withr_2.1.2       
[13] lifecycle_0.1.0    stringr_1.4.0      Quandl_2.10.0      munsell_0.5.0     
[17] anytime_0.3.6      gtable_0.3.0       labeling_0.3       curl_4.3          
[21] fansi_0.4.1        broom_0.5.3        Rcpp_1.0.3         backports_1.1.5   
[25] scales_1.1.0       install.load_1.2.1 jsonlite_1.6       farver_2.0.2      
[29] gridExtra_2.3      digest_0.6.23      packrat_0.5.0      stringi_1.4.3     
[33] grid_3.5.3         quadprog_1.5-8     cli_2.0.1          tools_3.5.3       
[37] magrittr_1.5       lazyeval_0.2.2     tibble_2.1.3       crayon_1.3.4      
[41] tidyr_1.0.0        pkgconfig_2.0.3    zeallot_0.1.0      assertthat_0.2.1  
[45] httr_1.4.1         rstudioapi_0.10    R6_2.4.1           nlme_3.1-137      
[49] compiler_3.5.3