Closed willu47 closed 3 months ago
The .require_data()
method returns a pandas.DataFrame of model-scenario pairs which meet the criteria given. This could be used internally to return only data which meets all the criteria.
If I understand you correctly, you want to have the data only for those model-scenario combinations where data for all regions is present, right?
Maybe the following can be helpful:
_df = df.filter(variable=[<list of variables>]
for r in [<list of regions>]:
_df.require_data(region=r, exclude_on_fail=True)
_df.filter(exclude=False, inplace=True)
The exclude_on_fail
-flag (available in all validation methods) sets the meta-indicator exclude
to True for all scenarios that fail the validation. You can iterate the require_data()
method and then remove all scenarios that failed at least one region-requirement.
But of course, having a strict-filter option could be useful, but the devil is in the details of the implementation, I guess...
Can I close this issue @willu47?
Yep
I'm dealing with some messy data, including models that have missing regions, variables and years.
If I filter on a sub-set of regions, I'm returned all the data which matches at least one of those regions (and the same for the other dimensions).
However, I only want to return the model, scenario and variable rows which match ALL the regions I pass in.
One idea is to add a
strict=False
flag to the.filter()
method. Ifstrict == True
, then only data which matches all the arguments is returned for each model-scenario-variable group.