The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
I have problem with the ignore_small_changes params, when I try to set the spike_failure_percent_threshold to 2% I get a very strange result on the threshold range in my volume anomalie test, it's literally a 2-fold increase in the training_avg value.
So after a little research, it seems that this i caused by a problem of calculation priority, let me explain :
When max value for volume anomalie test is calculated via this function : get_anomaly_scores_query.sql (tell me if I'm saying something wrong)
The max value is computed with the following code :
and the elementary.arithmetic_max function is used. This function is relatively simple and returns, as it's name suggets, the largest value between spike_avg_threshold and max_val (i.e. f(x,y)=0,5(x+y+∣x−y∣))
If we now look at the final calculation in a SQL query on BigQuery, we obtain the following field :
(0.5 * (((1 + 100/100.0) * training_avg) + 2 * training_stddev + training_avg + abs(((1 + 100/100.0) * training_avg) - 2 * training_stddev + training_avg))) end as max_metric_value,
Where spike_avg_threshold = ((1 + 100/100.0) * training_avg) = x and max_val = 2 * training_stddev + training_avg = y
But we can now easly see that we are going to have problem with the calcul because of the "2 * training_stddev + training_avg" part, in fact, the multiplication will apply only on the training_stddev value and wait the addition priority to sum the training_avg value "at the end" of this part of the calcul. This will make the calculation false and return a result other than the one expected, in this case, the highest value between spike_avg_threshold and max_val.
For exemple, if I set the spike_failure_percent_threshold: 2 I get an acceptance margin multiplied by 2 (100%) and not just 2%.
Logically, it would "be enough" to add parentheses to this problematic part to resolve the error, but I wanted to explain it to you to make sure I hadn't missed anything or made a mistake.
I'm available if my explanation isn't clear.
Good day to you.
Hi @Emmanuel2RPA !
Thanks for opening this and for the detailed explanation, I believe this bug was actually fixed here, though we haven't released it in an official version just yet, but we'll do so soon.
There were indeed missing parentheses.
Hello, this is copy of my message on slack
Platform: BigQuery
I have problem with the ignore_small_changes params, when I try to set the spike_failure_percent_threshold to 2% I get a very strange result on the threshold range in my volume anomalie test, it's literally a 2-fold increase in the training_avg value. So after a little research, it seems that this i caused by a problem of calculation priority, let me explain : When max value for volume anomalie test is calculated via this function : get_anomaly_scores_query.sql (tell me if I'm saying something wrong) The max value is computed with the following code :
and the
elementary.arithmetic_max
function is used. This function is relatively simple and returns, as it's name suggets, the largest value betweenspike_avg_threshold
andmax_val
(i.e. f(x,y)=0,5(x+y+∣x−y∣))If we now look at the final calculation in a SQL query on BigQuery, we obtain the following field :
(0.5 * (((1 + 100/100.0) * training_avg) + 2 * training_stddev + training_avg + abs(((1 + 100/100.0) * training_avg) - 2 * training_stddev + training_avg))) end as max_metric_value,
Where
spike_avg_threshold
=((1 + 100/100.0) * training_avg)
= x andmax_val
=2 * training_stddev + training_avg
= y But we can now easly see that we are going to have problem with the calcul because of the "2 * training_stddev + training_avg" part, in fact, the multiplication will apply only on the training_stddev value and wait the addition priority to sum the training_avg value "at the end" of this part of the calcul. This will make the calculation false and return a result other than the one expected, in this case, the highest value between spike_avg_threshold and max_val. For exemple, if I set thespike_failure_percent_threshold: 2
I get an acceptance margin multiplied by 2 (100%) and not just 2%.Logically, it would "be enough" to add parentheses to this problematic part to resolve the error, but I wanted to explain it to you to make sure I hadn't missed anything or made a mistake.
I'm available if my explanation isn't clear. Good day to you.