Open behnam-zakeri opened 2 years ago
Sounds like a great idea, thanks @behnam-zakeri!
One minor comment: I would start this as a new method, maybe df.validate_outliers()
...
For future reference: #715 added a new require_data()
method and #686 added a compute.quantile()
method. These two methods could be useful starting points for implementing this feature.
It would be nice to have a feature/method to exclude some outliers in timeseries data. This can be done, for example, as a new option under
pyam.IamDataFrame().validate()
. Method of calculation can be done either:iam = iam.validate({"Price|Carbon": {"outlier": "3SD"}}, exclude_on_fail=True)
The way this is calculated in python can be as follows (df is pandas.DataFrame):df = df[(df - df.mean()).abs() <= (3 * df.std())]
iam = iam.validate({"Price|Carbon": {"outlier": "[0.03, 0.98]"}}, exclude_on_fail=True)
There are some suggestions on how to do this here: https://stackoverflow.com/questions/35827863/remove-outliers-in-pandas-dataframe-using-percentiles