esgf2-us / intake-esgf

Programmatic access to the ESGF holdings
https://intake-esgf.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Enable shallow and deep queries #53

Open nocollier opened 2 months ago

nocollier commented 2 months ago

When looking for data in ESGF, a common mode of working is to first search for a few facets and then use the unique column values to refine your search iteratively. This is currently very slow for initial queries that will return many records. This is what we currently implement when you call search():

The Solr indices will take a long time to return the complete response and even if Globus is faster, it consumes a lot of resources for information we really didn't need in early stages of the search.

Instead, we could have search perform what I will call a shallow query. That is, we return 0 records, but ask the index for the unique facets that are part of the search. This response we use to manually build up the unique facet columns and the underlying dataframe remains empty initially.

When the user makes reference to cat.df (either directly or indirectly by calling something that uses it, such as to_dataset_dict()), then we pay the price of the full search, hoping that you have a better idea of what you need at this point.