elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.86k stars 158 forks source link

[ELE-1443] Adding a hardcoded lower (or upper) bounds to the anomaly bracket. #1034

Open ari-nz opened 1 year ago

ari-nz commented 1 year ago

Hi Team,

Is your feature request related to a problem? Please describe.

I find, sometimes, due to the stochasticity of data, that the anomaly checks sometime provide 'nonsense' results.

For example, here is a volume row count check

You can see that the data would only be anomalous if there were negative rows in the table.

Describe the solution you'd like It would be awesome if it was possible to specify sensible limits (and maybe even have them default?)

Something like

- elementary.volume_anomalies:
    anomaly_sensitivity: 2.5
    lower_bound: 0
    upper_bound: 100000

And the display and/or alerts would take the value that makes sense. In the case of lower bounds either 0 or a positive integer (if that's what is calculated). For upper bounds, you'd display 100,000 or a lower value if that's what was calculated.

Fundamentally there are some values that should auto-default to these sorts of limits:

Would you be willing to contribute this feature? Happy to help try to implement something if it doesn't go against any key principles you have :)

From SyncLinear.com | ELE-1443

Maayan-s commented 1 year ago

I totally agree with you that we have a problem there @ari-nz, we have this issue that we didn't address yet.

I agree that hardcoded bounds are the simple solution, but I still think that we might be missing a way that fits our approach that these tests are not assertions and shouldn't require you to configure harcoded limits.

ari-nz commented 1 year ago

861 must have slipped through my search net, apologies.

I agree that allowing setting arbitrary limits sort of goes against the philosophy; however for those fundamental values that are checked, would it make sense to instead have some sort of flag that could be invoked as a setting to at least indicate that 'logical settings' should apply?

Counts >=0
0 <= Percentages <= 1
Variance/StdDev >= 0
Hadarsagiv commented 1 year ago

This could also be solved in ways like adding a trend check.

Slack link: https://elementary-community.slack.com/archives/C02CTC89LAX?cid=C02CTC89LAX&thread_ts=1682076795.443979

elongl commented 12 months ago

I agree that allowing setting arbitrary limits sort of goes against the philosophy; however for those fundamental values that are checked, would it make sense to instead have some sort of flag that could be invoked as a setting to at least indicate that 'logical settings' should apply?

Counts >=0
0 <= Percentages <= 1
Variance/StdDev >= 0

Hi @ari-nz , we've applied the changes you suggested in this PR! They'll be released in the upcoming version.