ESGF / esgf-download

ESGF data transfer and replication tool
https://esgf.github.io/esgf-download/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Unrecognized facet name requires adding the facet manually to selection.py #3

Open AtefBN opened 1 year ago

AtefBN commented 1 year ago
[2023-04-19 17:28:02]  DEBUG     root
Locals:
{
    'self': Selection(
        driving_model='MOHC-HadGEM2-ES',
        ensemble='r1i1p1',
        experiment='rcp26',
        project='CORDEX'
    ),
    'name': 'rcm_version',
    'value': ['v2']
}

[2023-04-19 17:28:02]  ERROR     root

Traceback (most recent call last):
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/tui.py", line 154, in logging
    yield
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/cli/search.py", line 69, in search
    query = parse_query(
            ^^^^^^^^^^^^
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/cli/utils.py", line 175, in parse_query
    selection = parse_facets(facets)
                ^^^^^^^^^^^^^^^^^^^^
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/cli/utils.py", line 155, in parse_facets
    selection[name] = values
    ~~~~~~~~~^^^^^^
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/models/selection.py", line 105, in __setitem__
    raise KeyError(name)
KeyError: 'rcm_version'
JoranAngevaare commented 1 year ago

I think I encountered a similar issue following the search page of the documentation.

Following the documentation, I wanted to query based on the version, the --hints command listed the available versions:

(py310) [angevaar@pc160101 joran]$ esgpull search project:CMIP6 variable_id:tas institution_id:IPSL frequency:mon --facets | tail -3
  "variant_label",
  "version"
]
(py310) [angevaar@pc160101 joran]$ esgpull search project:CMIP6 variable_id:tas institution_id:IPSL frequency:mon --hints version | tail
      "20210826": 7,
      "20211229": 3,
      "20220105": 1,
      "20220426": 1,
      "20220720": 2,
      "20220721": 6,
      "20220722": 606
    }
  }
]

Yet, building the query leads to a key error:

(py310) [angevaar@pc160101 joran]$ esgpull search project:CMIP6 variable_id:tas institution_id:IPSL frequency:mon version:20220722
KeyError: 'version'
See /data/ssd/joran/esgpull/log/esgpull-search-2023-05-04_07-17-47.log for error log.
Aborted!

Full traceback

(py310) [angevaar@pc160101 joran]$ less /data/ssd/joran/esgpull/log/esgpull-search-2023-05-04_07-17-47.log
[2023-05-04 09:17:47]  DEBUG     root
Locals:
{'self': Selection(frequency='mon', institution_id='IPSL', project='CMIP6', variable_id='tas'), 'name': 'version', 'value': ['20220722']}

[2023-05-04 09:17:47]  ERROR     root

Traceback (most recent call last):
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/tui.py", line 154, in logging
    yield
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/cli/search.py", line 69, in search
    query = parse_query(
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/cli/utils.py", line 175, in parse_query
    selection = parse_facets(facets)
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/cli/utils.py", line 155, in parse_facets
    selection[name] = values
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/models/selection.py", line 116, in __setitem__
    raise KeyError(name)
KeyError: 'version'

Build info

## Version
(py310) [angevaar@pc160101 joran]$ pip list | grep esg
esgpull              0.4.0
## Build method
conda install esgpull=0.4.0 --channel ipsl --channel conda-forge
## OS
(py310) [angevaar@pc160101 joran]$ cat /etc/os-release | head -2
NAME="Fedora Linux"
VERSION="36 (Workstation Edition)"
svenrdz commented 1 year ago

Sorry, I just realized I forgot to enable notifications on this repo, since it was moved to the ESGF organization.

Currently, the list of facet keys that can be used in a Query is hard-coded in this file: https://github.com/ESGF/esgf-download/blob/main/esgpull/models/selection.py#L158-L196

The --hints flag shows anything returned by the search API, and therefore is not directly linked to the list of "valid" facet keys.

For facet values, there is no such hard constraint. I relaxed it after realizing it prevented using some of esgpull's search features inside an saved Query (i.e. wildcard syntax).

As it seems a recurring issue, we should definitely improve this validation, and I can think of a few ways:

In the meantime, there is a workaround that concerns version specifically, since versions are somehow related to publication date. By that I mean the publication date for non-replicas datasets should in theory be the same as the version of the dataset. You can use the --from and --to parameters that filter on the publication date of a dataset, which take a date as value, following the format YYYY-MM-dd. This currently only works inside the search command, but it's on my todo list to include those parameters in saved queries with the add command.