IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
221 stars 115 forks source link

Support validation of values in `require_data()` #769

Closed danielhuppmann closed 6 months ago

danielhuppmann commented 10 months ago

In @l-welder & @coroa's pathways-ensemble-analysis tool, they implemented a utility to check the values of timeseries data (in conjunction with filters).

This feature is similar to the pyam validate() method. Both of these approaches could be streamlined into the require_data() method such that a user can do

df.require_data(variable="<variable>", upper_bound=..., lower_bound=...)

Bonus question: how should this method behave if the variable does not exist? Should there be an argument to select the right approach? Or should we do require_data() means that the datapoint(s) must exist within the range and a new validate_data means that the values must be in the range if they exist for these filters?

fyi @phackstock

coroa commented 10 months ago

I actually did not contribute much to the tool.

Agreed that require_data and validate are closely related. There could be an on_fail: Literal["raise", "exclude", "ignore"] argument (with "raise" being the default), where "raise" implies the current exception behaviour, "exclude" swaps the bit in exclude like validate and "ignore" allows to ask the question if it is there (and within bounds) without side-effects.

danielhuppmann commented 10 months ago

There could be an on_fail: Literal["raise", "exclude", "ignore"] argument.

Nice, but I would use "warn" instead of "ignore"...

phackstock commented 10 months ago

Not sure I understand the use of "ignore" or "warn". Would that just issue a warning if something is missing? If so, is the point of require_data not to ensure that certain data is present before continuing? In this case I think "ignore" or "warning" might defeat the purpose. In any case you can still use a try, except clause to achieve the same effect if it's needed.

danielhuppmann commented 6 months ago

Validation of values is now implemented in the revised validate() method.