Open garfieldthesam opened 9 months ago
Hey @garfieldthesam ,
I understand why this is unexpected, but this is actually intentional.
At first when we introduced seasonality, users didn't configure the days_back
accordingly and kept seeing "not enough data to calculate an anomaly" result.
We did a naive solution -
I believe the more mature approach would be to increase the training set size only if needed, and only to reach the minimal training set size. For example -
For this configuration:
days_back: 60
seasonality: day_of_week
min_training_set_size: 14
We should not naively do days_back
*7, but do:
if min_training_set_size * 7 < days back (
days_back = min_training_set_size * 7
)
else days_back
@ellakz @dapollak what do you think?
@Maayan-s ah I see but one easy thing to fix: the docs you pictured above only say this applies just to day_of_week
and not hour_of_week
.
I believe the more mature approach would be to increase the training set size only if needed, and only to reach the minimal training set size.
I agree with this
But, regarding this proposal:
if min_training_set_size * 7 < days back (
days_back = min_training_set_size * 7
)
else days_back
I disagree. I am pretty sure that the way most users think of min_training_set_size
is as a scalar number that the elementary packages will never change. For example, if I set min_training_set_size = 70
, days_back = 70
, and seasonality=day_of_week
, I expect elementary to run the test using 70 days (ie. 10 weeks) of data.
It might be made clear in the docs that this isn't how min_training_set_size
works but not everyone is going to read the docs for every parameter they use, particularly if the name feels self-explanatory
Describe the bug
When using
hour_of_week
seasonality the training set goes back to 2016 in violation of thedays_back
and/ortraining_period
configurations.To Reproduce Steps to reproduce the behavior:
1. Run anomaly test without seasonality Example
Look at compiled code and see that in the
create table
statement's first CTE this is the code used to generate the date buckets:2. Run anomaly test with this seasonality
Example:
Look at compiled code and see that in the
create table
statement's first CTE this is the code used to generate the date buckets:Expected behavior I would not expect window for the query to extend back 6 years. One year should have 52 week-hour combinations and therefore be more than enough to generate a reasonably reliable seasonality adjustment.
6 years makes more sense if the test is for week-of-year-and-hour seasonality, not week-and-hour seasonality, but that doesn't make sense to me as a way to compute this seasonality and that's not my understanding of how it works after reading the Elementary docs.
Environment (please complete the following information):
dbt packages:
dbt
version 1.6.7dbt-databricks
version 1.6.6