ecmwf-projects / cads-adaptors

CADS data retrieval adaptors and associated methods and tools
Apache License 2.0
2 stars 1 forks source link

URL adaptor does not find all matches #138

Closed EddyCMWF closed 4 months ago

EddyCMWF commented 4 months ago

The URL adaptor does not match all potential URLs when performing the jinja templating. This is because a request can contain items which are not relevant to the a granule being requested, and this invalidates the conditionals in the jinja template.

The underlying problem is that the unfactorise method assumes that all the valid requests are uniform in keys. This is a very fast algorithm, but falls down when we have a catalogue entry where some components do not require a given key (see example below, it will be added to a test in a develop branch).

An approach to solve this (as is done in the legacy system) would be to convert request to a list of valid_requests using the constraints, i.e. take the Intersection of the request with each constraint to produce a list of valid_request (removing any duplicates at the end). We would then loop over this list of valid_request around the unfactorise loop (https://github.com/ecmwf-projects/cads-adaptors/blob/main/cads_adaptors/tools/url_tools.py#L29), and the assumption of uniform keys is then valid.

Below is a request which demonstrates the issue, in the old system it find 4 URLs, in the new system it only find 3. The new system misses the surface_downwelling_shortwave_radiation because the whole request includes the energy_product_type, which should be absent/null according to the Jinja template.

collection_id = 'sis-energy-derived-projections'
request = {
  "variable": ["surface_downwelling_shortwave_radiation", "electricity_demand"],
  "spatial_aggregation": ["country_level"],
  "temporal_aggregation": ["monthly"],
  "energy_product_type": ["energy"],
  "experiment": ["rcp_4_5", "rcp_8_5"],
  "rcm": ["racmo22e", "cclm4_8_17"],
  "gcm": ["hadgem2_es", "mpi_esm_lr"],
}

For testing we can use this version of the entry where I have removed large parts of the dataset to focus on the bug: https://cds-dev-cci2.copernicus-climate.eu/datasets/test-adaptor-url-debugging?tab=download

malmans2 commented 4 months ago

Closed by #141