business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
339 stars 61 forks source link

Error in mutate_impl(.data, dots): Class 'character' is not a known index class #22

Closed bearloga closed 5 years ago

bearloga commented 5 years ago

Hello! I'm having problems using my own data with the package. Here's a little bit of the data for example:

date installs
2014-10-01 23350
2014-10-02 23154
2014-10-03 22785
2014-10-20 23041
2014-10-21 24170
x <- structure(
  list(
    date = structure(c(16344, 16345, 16346, 16347, 
                       16348, 16349, 16350, 16351, 16352, 16353, 16354, 16355, 16356, 
                       16357, 16358, 16359, 16360, 16361, 16362, 16363, 16364),
                     class = "Date"),
    installs = c(23350L, 23154L, 22785L, 24356L, 24234L, 22774L, 
                 22978L, 23028L, 22708L, 23510L, 25631L, 24591L, 22854L, 22540L, 
                 24313L, 24717L, 24169L, 26092L, 25254L, 23041L, 24170L)
  ),
  row.names = c(NA, -21L), class = c("tbl_df", "tbl", "data.frame")
)

When I run:

x_anomalized <- x %>%
  as_tbl_time("date") %>%
  time_decompose(installs) %>%
  anomalize(remainder) %>%
  time_recompose()

I get:

Error in mutate_impl(.data, dots) : 
  Evaluation error: Class 'character' is not a known index class..
In addition: Warning messages:
1: In to_posixct_numeric.default(index) : NAs introduced by coercion
2: In to_posixct_numeric.default(index) : NAs introduced by coercion

The problem appears to be at the very first step with time_decompose(). This is my sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] bindrcpp_0.2.2   anomalize_0.1.1  tibbletime_0.1.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0       rstudioapi_0.8   bindr_0.1.1     
 [4] magrittr_1.5     tidyselect_0.2.5 munsell_0.5.0   
 [7] lattice_0.20-38  colorspace_1.3-2 R6_2.3.0        
[10] rlang_0.3.0.1    stringr_1.3.1    plyr_1.8.4      
[13] dplyr_0.7.8      xts_0.11-2       tools_3.5.1     
[16] grid_3.5.1       nlme_3.1-137     broom_0.5.0     
[19] gtable_0.2.0     timetk_0.1.1.1   lazyeval_0.2.1  
[22] assertthat_0.2.0 tibble_1.4.2     crayon_1.3.4    
[25] purrr_0.2.5      ggplot2_3.1.0    tidyr_0.8.2     
[28] glue_1.3.0       stringi_1.2.4    compiler_3.5.1  
[31] pillar_1.3.0     backports_1.1.2  scales_1.0.0    
[34] lubridate_1.7.4  zoo_1.8-4        pkgconfig_2.0.2 

Please help. I compared my data with tidyverse_cran_downloads and I cannot figure out what I'm missing. Thank you!

DavisVaughan commented 5 years ago

Hi @bearloga, looks like this is because you use "date" rather than date. Sorry for the pain this caused you as the error catching here isn't great. That's my fault.

library(anomalize)
library(tibbletime)
#> 
#> Attaching package: 'tibbletime'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(magrittr)

x <- structure(
  list(
    date = structure(c(16344, 16345, 16346, 16347, 
                       16348, 16349, 16350, 16351, 16352, 16353, 16354, 16355, 16356, 
                       16357, 16358, 16359, 16360, 16361, 16362, 16363, 16364),
                     class = "Date"),
    installs = c(23350L, 23154L, 22785L, 24356L, 24234L, 22774L, 
                 22978L, 23028L, 22708L, 23510L, 25631L, 24591L, 22854L, 22540L, 
                 24313L, 24717L, 24169L, 26092L, 25254L, 23041L, 24170L)
  ),
  row.names = c(NA, -21L), class = c("tbl_df", "tbl", "data.frame")
)

x_anomalized <- x %>%
  as_tbl_time(date) %>%
  time_decompose(installs) %>%
  anomalize(remainder) %>%
  time_recompose()
#> frequency = 5.5 days
#> trend = 21 days

x_anomalized
#> # A time tibble: 21 x 10
#> # Index: date
#>    date       observed season  trend remainder remainder_l1 remainder_l2
#>    <date>        <dbl>  <dbl>  <dbl>     <dbl>        <dbl>        <dbl>
#>  1 2014-10-01    23350  175.  23133.     42.1        -4592.        4750.
#>  2 2014-10-02    23154  -92.7 23187.     59.5        -4592.        4750.
#>  3 2014-10-03    22785  -92.7 23241.   -364.         -4592.        4750.
#>  4 2014-10-04    24356 -150.  23296.   1210.         -4592.        4750.
#>  5 2014-10-05    24234 -150.  23357.   1027.         -4592.        4750.
#>  6 2014-10-06    22774  -55.2 23418.   -589.         -4592.        4750.
#>  7 2014-10-07    22978  175.  23479.   -676.         -4592.        4750.
#>  8 2014-10-08    23028  -92.7 23539.   -418.         -4592.        4750.
#>  9 2014-10-09    22708  -92.7 23599.   -798.         -4592.        4750.
#> 10 2014-10-10    23510 -150.  23659.      1.47       -4592.        4750.
#> # ... with 11 more rows, and 3 more variables: anomaly <chr>,
#> #   recomposed_l1 <dbl>, recomposed_l2 <dbl>

Created on 2018-11-12 by the reprex package (v0.2.0).

bearloga commented 5 years ago

Ohhhhhh. Oops, my bad also. Thank you very much for clarification @DavisVaughan!