business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
338 stars 60 forks source link

Error: Only year, quarter, month, week, and day periods are allowed for an index of class Date #38

Open kwythers opened 5 years ago

kwythers commented 5 years ago

I have arranged my own data into as close a tibble to the "tidyverse_cran_downloads" demonstration data as possible:

class(tidyverse_cran_downloads) [1] "grouped_tbl_time" "tbl_time" "grouped_df" "tbl_df" "tbl" "data.frame"

class(isw_simple) [1] "grouped_tbl_time" "tbl_time" "grouped_df" "tbl_df" "tbl" "data.frame"

glimpse(tidyverse_cran_downloads) Observations: 6,375 Variables: 3 Groups: package [15] $ date 2017-01-01, 2017-01-02, 2017-01-03, 2017-01-04, 2017-01-05, 2017-01-06, 2017-01-07, 2017-01-08, 2017-01-09... $ count 873, 1840, 2495, 2906, 2847, 2756, 1439, 1556, 3678, 7086, 7219, 0, 5960, 2904, 2854, 5428, 6358, 6973, 661... $ package "tidyr", "tidyr", "tidyr", "tidyr", "tidyr", "tidyr", "tidyr", "tidyr", "tidyr", "tidyr", "tidyr", "tidyr",...

glimpse(isw_tss) Observations: 15,744 Variables: 4 Groups: staff_id_last_updt [9] $ sample_date 2011-06-15, 2011-06-15, 2011-06-22, 2011-06-22, 2011-08-16, 2011-08-29, 2011-08-29, 2011-09-20,... $ reported_value 68.0, 62.0, 38.0, 3.0, 35.1, 147.0, 147.0, 32.4, 1.0, 0.0, 0.0, 13.0, 130.0, 25.9, 10.4, 10.4, 2... $ parameter_name "Solids, Total Suspended (TSS)", "Solids, Total Suspended (TSS)", "Solids, Total Suspended (TSS)... $ staff_id_last_updt "bolafso", "bolafso", "bolafso", "bolafso", "bolafso", "bolafso", "bolafso", "bolafso", "bolafso...

As you can see, both 'class()' and 'glimpse()' show very similar structures. I can replicate the results with the demonstration data just fine. However, when I try and apply the 'time_decompose()' function to my data (isw_tss), I get the "Only year, quarter, month, week, and day periods are allowed for an index of class Date" error message.

I am confused by this as my date data are in the ymd format (same as the demonstration data). Any thoughts would be much appreciated.

I have attached a sample data file isw_tss.txt

Here is the code I have modified up to the error message bits:

load libraries

library(tidyverse) library(tidyquant) library(lubridate) library(ggplot2) library(ggpubr) library(anomalize) library(tibbletime)

read in the data

isw_dmr <- read_csv('C:\Users\kwyther\export_isw_dmr.csv')

change to lower case and remove rows with no reported value

isw_dmr <- rename_all(isw_dmr, tolower) %>% drop_na(reported_value)

change sample_dates to date

isw_dmr$sample_date <- dmy(isw_dmr$sample_date)

list of paramters

params <- isw_dmr %>% distinct(parameter_name)

list of staff entering data

staff <- isw_dmr %>% distinct(staff_id_last_updt)

simplify by parameter

tss

isw_tss <- isw_dmr %>% select(sample_date, reported_value, parameter_name, staff_id_last_updt) %>% filter(parameter_name == 'Solids, Total Suspended (TSS)')

isw_tss <- isw_tss %>% group_by(staff_id_last_updt) %>% as_tbl_time(sample_date)

isw_tss <- isw_tss %>% arrange(sample_date, .by_group = TRUE)

isw_tss %>% ggplot(aes(sample_date, reported_value)) + geom_point(color = "#2c3e50", alpha = 0.25) + facet_wrap(staff_id_last_updt ~ .) + theme_minimal() + theme(axis.text.x = element_text(angle = 30, hjust = 1)) + labs(title = "TSS reported values by staff", subtitle = "Data from ISW_DMRs")

isw_tss %>%

Data Manipulation / Anomaly Detection

time_decompose(reported_value, method = "stl") %>% anomalize(remainder, method = "iqr") %>% time_recompose() %>%

Anomaly Visualization

plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.25) + labs(title = "TSS Anomalies", subtitle = "STL + IQR Methods")

kwythers commented 5 years ago

I might have figured this out... Turned out that I was running the time decompose on data with multiple observations on a single day. Once I filtered down to single locations and sites where only one observation per day was recorded, I lost the error.