business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
339 stars 61 forks source link

The output of clean_anomalies() is a time tibble, which doesn't play nice with timetk::future_frame() #69

Open mabuimo opened 1 year ago

mabuimo commented 1 year ago

Hi, Let's clean the anomalies of a given dataset:

data(tidyverse_cran_downloads)

cleaned <- tidyverse_cran_downloads |>
  anomalize::time_decompose(count, method = "stl") |>
  anomalize::anomalize(remainder, method = "iqr") |>
  anomalize::clean_anomalies()

The output is a time tibble. Now if you extend these time series by 12 points:

cleaned |> 
  dplyr::group_by(package) |>
  timetk::future_frame(
    .date_var = date ,
    .length_out = 12,
    .bind_data = FALSE
  ) |>
  dplyr::ungroup()

You get:

Error in `tbl_at_vars()`:
! Can't subset columns that don't exist.
✖ Column `package` doesn't exist.

This error can be solved by ensuring that cleaned is a tibble before extending the dataframe:

cleaned |> 
as_tibble() |> 
  dplyr::group_by(package) |>
  timetk::future_frame(
    .date_var = date ,
    .length_out = 12,
    .bind_data = FALSE
  ) |>
  dplyr::ungroup()

Thank you Matt!