business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
338 stars 60 forks source link

Anonalize fails on non-time series grouped data #55

Open larry77 opened 4 years ago

larry77 commented 4 years ago

Dear All, Hopefully the reprex is self-explanatory. I plan to use anomalize on non-time series data. It should still work according to the documentation (without the time series decomposition) and it does, but not on non-time series grouped data. Any ideas?


library(tidyverse)

library(anomalize)
#> ══ Use anomalize to improve your Forecasts by 50%! ═════════════════════════════
#> Business Science offers a 1-hour course - Lab #18: Time Series Anomaly Detection!
#> </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>

test1 <- tidyverse_cran_downloads %>%
    time_decompose(count) %>%
    anomalize(remainder)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo

print(test1)  ##and this works fine
#> # A time tibble: 6,375 x 9
#> # Index:  date
#> # Groups: package [15]
#>    package date       observed season trend remainder remainder_l1 remainder_l2
#>    <chr>   <date>        <dbl>  <dbl> <dbl>     <dbl>        <dbl>        <dbl>
#>  1 broom   2017-01-01    1053. -1007. 1708.    352.         -1725.        1704.
#>  2 broom   2017-01-02    1481    340. 1731.   -589.         -1725.        1704.
#>  3 broom   2017-01-03    1851    563. 1753.   -465.         -1725.        1704.
#>  4 broom   2017-01-04    1947    526. 1775.   -354.         -1725.        1704.
#>  5 broom   2017-01-05    1927    430. 1798.   -301.         -1725.        1704.
#>  6 broom   2017-01-06    1948    136. 1820.     -8.11       -1725.        1704.
#>  7 broom   2017-01-07    1542   -988. 1842.    688.         -1725.        1704.
#>  8 broom   2017-01-08    1479. -1007. 1864.    622.         -1725.        1704.
#>  9 broom   2017-01-09    2057    340. 1887.   -169.         -1725.        1704.
#> 10 broom   2017-01-10    2278    563. 1909.   -194.         -1725.        1704.
#> # … with 6,365 more rows, and 1 more variable: anomaly <chr>

test2 <- tidyverse_cran_downloads %>%
    group_by(package) %>% 
    time_decompose(count) %>%
    anomalize(remainder)

print(test2)  ##and also this works fine
#> # A time tibble: 6,375 x 9
#> # Index:  date
#> # Groups: package [15]
#>    package date       observed season trend remainder remainder_l1 remainder_l2
#>    <chr>   <date>        <dbl>  <dbl> <dbl>     <dbl>        <dbl>        <dbl>
#>  1 broom   2017-01-01    1053. -1007. 1708.    352.         -1725.        1704.
#>  2 broom   2017-01-02    1481    340. 1731.   -589.         -1725.        1704.
#>  3 broom   2017-01-03    1851    563. 1753.   -465.         -1725.        1704.
#>  4 broom   2017-01-04    1947    526. 1775.   -354.         -1725.        1704.
#>  5 broom   2017-01-05    1927    430. 1798.   -301.         -1725.        1704.
#>  6 broom   2017-01-06    1948    136. 1820.     -8.11       -1725.        1704.
#>  7 broom   2017-01-07    1542   -988. 1842.    688.         -1725.        1704.
#>  8 broom   2017-01-08    1479. -1007. 1864.    622.         -1725.        1704.
#>  9 broom   2017-01-09    2057    340. 1887.   -169.         -1725.        1704.
#> 10 broom   2017-01-10    2278    563. 1909.   -194.         -1725.        1704.
#> # … with 6,365 more rows, and 1 more variable: anomaly <chr>

## From the documentation:
## For non-time series data (data without trend), the anomalize()
## function can be used without time
## series decomposition.

test3 <- tidyverse_cran_downloads %>%
    select(-date) %>%
    filter(package=="broom") %>% 
    anomalize(count)

print(test3) ## OK!
#> # A tibble: 425 x 5
#>    count package count_l1 count_l2 anomaly
#>    <dbl> <chr>      <dbl>    <dbl> <chr>  
#>  1  1053 broom     -2535.    7965. No     
#>  2  1481 broom     -2535.    7965. No     
#>  3  1851 broom     -2535.    7965. No     
#>  4  1947 broom     -2535.    7965. No     
#>  5  1927 broom     -2535.    7965. No     
#>  6  1948 broom     -2535.    7965. No     
#>  7  1542 broom     -2535.    7965. No     
#>  8  1479 broom     -2535.    7965. No     
#>  9  2057 broom     -2535.    7965. No     
#> 10  2278 broom     -2535.    7965. No     
#> # … with 415 more rows

### now let us try this on grouped data

test4 <- tidyverse_cran_downloads %>%
    select(-date) %>% 
    group_by(package) %>% 
    anomalize(count)
#> Error in value[[3L]](cond): Error in prep_tbl_time(): No date or datetime column found.

print(test4)  ##and now an error ## what to do?
#> Error in print(test4): object 'test4' not found

Created on 2020-07-30 by the reprex package (v0.3.0)