IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
227 stars 118 forks source link

Filter strict - only return data which matches exact combination of all arguments #740

Closed willu47 closed 3 months ago

willu47 commented 1 year ago

I'm dealing with some messy data, including models that have missing regions, variables and years.

If I filter on a sub-set of regions, I'm returned all the data which matches at least one of those regions (and the same for the other dimensions).

However, I only want to return the model, scenario and variable rows which match ALL the regions I pass in.

One idea is to add a strict=False flag to the .filter() method. If strict == True, then only data which matches all the arguments is returned for each model-scenario-variable group.

willu47 commented 1 year ago

The .require_data() method returns a pandas.DataFrame of model-scenario pairs which meet the criteria given. This could be used internally to return only data which meets all the criteria.

danielhuppmann commented 1 year ago

If I understand you correctly, you want to have the data only for those model-scenario combinations where data for all regions is present, right?

Maybe the following can be helpful:

_df = df.filter(variable=[<list of variables>]

for r in [<list of regions>]:
    _df.require_data(region=r, exclude_on_fail=True)

_df.filter(exclude=False, inplace=True)

The exclude_on_fail-flag (available in all validation methods) sets the meta-indicator exclude to True for all scenarios that fail the validation. You can iterate the require_data() method and then remove all scenarios that failed at least one region-requirement.

danielhuppmann commented 1 year ago

But of course, having a strict-filter option could be useful, but the devil is in the details of the implementation, I guess...

danielhuppmann commented 3 months ago

Can I close this issue @willu47?

willu47 commented 3 months ago

Yep