Closed jsta closed 2 months ago
I am dealing with this with the following:
def should_i_keep_it(sub_df):
sub_d = sub_df.dropna()
return sub_d.shape[0] == sub_df.shape[0]
cat.remove_incomplete(should_i_keep_it)
Thank you for the report, apologize for the trouble. I have been a few weeks away from this, but will get back to it in the next week or so. Seems to be that some facet information is not available in the metadata record and so pandas fills it with nan's. That is later causing the interaction with parts of the code and probably there are others. In your case it appears to be when the keys of the dictionary are formed. This is slated for a rework and I will take this problem into consideration.
This is fixed in v2024.7.15
in the sense that now by default you will not have datetime_{start|stop}
in the dataframe. The problem is that not all records in the ESGF database have these fields. The next thing on my list is to rework to_dataset_dict()
and then we won't be building dictionary keys from catalog columns. Will close this for now, as #62 should solve your problem.
intake_esgf.__version__
I believe that catalog.py has trouble when a field is nan instead of string on approximately the line referenced below.
Here is the summary info for cat (note how one of the datetime_stop[s] is nan):
Summary information for 2 results: institution_id [CCCR-IITM] activity_drs [ScenarioMIP] table_id [Amon] experiment_id [ssp585] source_id [IITM-ESM] mip_era [CMIP6] datetime_start [2015-01-17T00:00:00Z, 2015-01-16T12:00:00Z] variable_id [tas] grid_label [gn] member_id [r1i1p1f1] project [CMIP6] datetime_stop [nan, 2099-12-16T12:00:00Z]