bopen / c3s-eqc-toolbox-template

CADS Toolbox template application
Apache License 2.0
5 stars 4 forks source link

Can't download data #162

Open camila-trigoso opened 1 day ago

camila-trigoso commented 1 day ago

What happened?

I have problems when using the fuction download.download_and_transform. It works when I try to download data that my coworker has already downloaded but if I change the start/end time outside that range, it does not work.

Minimal Complete Verifiable Example

# Time
start = "1997-01"
stop = "2001-01" #<----IT DOES NOT WORK WHEN I TRY TO DOWNLOAD INFORMATION ABOVE 2000

# Region
lon_slice = slice(-92.10, -84.80)
lat_slice = slice(46.30, 49.00)

# Variable
varname = "lake_surface_water_temperature"
collection_id = "satellite-lake-water-temperature"
request = {
    "version": "4.0",
    "variable": "all",
    "format": "zip",
}
def spatial_weighted_mean_of_region(ds, lon_slice, lat_slice, varname):
    ds = ds[[varname]]
    ds = utils.regionalise(ds, lon_slice=lon_slice, lat_slice=lat_slice)
    ds = diagnostics.spatial_weighted_mean(ds)
    return ds
chunks = {"year": 1, "month": 1}
requests = download.update_request_date(
    request, start=start, stop=stop, stringify_dates=True
)
ds = download.download_and_transform(
    collection_id,
    requests,
    chunks=chunks,
    transform_func=spatial_weighted_mean_of_region,
    transform_func_kwargs={
        "lon_slice": lon_slice,
        "lat_slice": lat_slice,
        "varname": varname,
    },
)

Relevant log output

76%|███████▌  | 37/49 [00:01<00:00, 36.54it/s]2024-10-25 07:51:26,475 INFO [2024-09-28T00:00:00] **Welcome to the New Climate Data Store (CDS)!** This new system is in its early days of full operations and still undergoing enhancements and fine tuning. Some disruptions are to be expected. Your 
[feedback](https://jira.ecmwf.int/plugins/servlet/desk/portal/1/create/202) is key to improve the user experience on the new CDS for the benefit of everyone. Thank you.
2024-10-25 07:51:26,476 WARNING [2024-09-26T00:00:00] Should you have not yet migrated from the old CDS system to the new CDS, please check our [informative page](https://confluence.ecmwf.int/x/uINmFw) for guidance.
2024-10-25 07:51:26,477 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2024-10-25 07:51:26,477 INFO [2024-09-16T00:00:00] Remember that you need to have an ECMWF account to use the new CDS. **Your old CDS credentials will not work in new CDS!**
2024-10-25 07:51:26,477 WARNING [2024-06-16T00:00:00] CDS API syntax is changed and some keys or parameter names may have also changed. To avoid requests failing, please use the "Show API request code" tool on the dataset Download Form to check you are using the correct syntax for your API request.
2024-10-25 07:51:26,902 INFO [2024-09-28T00:00:00] **Welcome to the New Climate Data Store (CDS)!** This new system is in its early days of full operations and still undergoing enhancements and fine tuning. Some disruptions are to be expected. Your 
[feedback](https://jira.ecmwf.int/plugins/servlet/desk/portal/1/create/202) is key to improve the user experience on the new CDS for the benefit of everyone. Thank you.
2024-10-25 07:51:26,903 WARNING [2024-09-26T00:00:00] Should you have not yet migrated from the old CDS system to the new CDS, please check our [informative page](https://confluence.ecmwf.int/x/uINmFw) for guidance.
2024-10-25 07:51:26,904 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2024-10-25 07:51:26,904 INFO [2024-09-16T00:00:00] Remember that you need to have an ECMWF account to use the new CDS. **Your old CDS credentials will not work in new CDS!**
2024-10-25 07:51:26,905 WARNING [2024-06-16T00:00:00] CDS API syntax is changed and some keys or parameter names may have also changed. To avoid requests failing, please use the "Show API request code" tool on the dataset Download Form to check you are using the correct syntax for your API request.
 76%|███████▌  | 37/49 [00:02<00:00, 14.10it/s]
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[25], line 5
      1 chunks = {"year": 1, "month": 1}
      2 requests = download.update_request_date(
      3     request, start=start, stop=stop, stringify_dates=True
      4 )
----> 5 ds = download.download_and_transform(
      6     collection_id,
      7     requests,
      8     chunks=chunks,
      9     transform_func=spatial_weighted_mean_of_region,
     10     transform_func_kwargs={
     11         "lon_slice": lon_slice,
     12         "lat_slice": lat_slice,
     13         "varname": varname,
     14     },
     15 )

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/c3s_eqc_automatic_quality_control/download.py:614, in download_and_transform(collection_id, requests, chunks, split_all, transform_func, transform_func_kwargs, transform_chunks, n_jobs, invalidate_cache, cached_open_mfdataset_kwargs, quiet, **open_mfdataset_kwargs)
    607             cacholote.delete(
    608                 func.func, *func.args, request_list=[request], **func.keywords
    609             )
    610         with (
    611             cacholote.config.set(return_cache_entry=True),
    612             _set_env(tqdm_disable=True),
    613         ):
--> 614             sources.append(func(request_list=[request]).result["args"][0]["href"])
    615     ds = xr.open_mfdataset(sources, **cached_open_mfdataset_kwargs)
    616 else:
    617     # Cache final dataset transformed

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cacholote/cache.py:102, in cacheable.<locals>.wrapper(*args, **kwargs)
     99                 warnings.warn(str(ex), UserWarning)
    100                 clean._delete_cache_entries(session, cache_entry)
--> 102 result = func(*args, **kwargs)
    103 cache_entry = database.CacheEntry(
    104     key=hexdigest,
    105     expiration=settings.expiration,
    106     tag=settings.tag,
    107 )
    108 try:

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/c3s_eqc_automatic_quality_control/download.py:436, in _download_and_transform_requests(collection_id, request_list, transform_func, transform_func_kwargs, **open_mfdataset_kwargs)
    429 def _download_and_transform_requests(
    430     collection_id: str,
    431     request_list: list[dict[str, Any]],
   (...)
    434     **open_mfdataset_kwargs: Any,
    435 ) -> xr.Dataset:
--> 436     sources = get_sources(collection_id, request_list)
    437     preprocess = functools.partial(
    438         _preprocess,
    439         collection_id=collection_id,
    440         preprocess=open_mfdataset_kwargs.pop("preprocess", None),
    441     )
    443     grib_ext = (".grib", ".grb", ".grb1", ".grb2")

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/c3s_eqc_automatic_quality_control/download.py:349, in get_sources(collection_id, request_list)
    347 disable = os.getenv("TQDM_DISABLE", "False") == "True"
    348 for request in tqdm.tqdm(request_list, disable=disable):
--> 349     sources.update(retrieve(collection_id, request))
    350 return list(sources)

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/c3s_eqc_automatic_quality_control/download.py:339, in retrieve(collection_id, request)
    334 def retrieve(collection_id: str, request: dict[str, Any]) -> list[str]:
    335     with cacholote.config.set(
    336         return_cache_entry=False,
    337         io_delete_original=True,
    338     ):
--> 339         return [file.path for file in _cached_retrieve(collection_id, request)]

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cacholote/cache.py:102, in cacheable.<locals>.wrapper(*args, **kwargs)
     99                 warnings.warn(str(ex), UserWarning)
    100                 clean._delete_cache_entries(session, cache_entry)
--> 102 result = func(*args, **kwargs)
    103 cache_entry = database.CacheEntry(
    104     key=hexdigest,
    105     expiration=settings.expiration,
    106     tag=settings.tag,
    107 )
    108 try:

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/c3s_eqc_automatic_quality_control/download.py:324, in _cached_retrieve(collection_id, request)
    322 if NOCACHE:
    323     request = request | {"nocache": datetime.datetime.now().isoformat()}
--> 324 ds = earthkit.data.from_source("cds", collection_id, request, prompt=False)
    325 if isinstance(ds, ShapeFileReader) and hasattr(ds._parent, "_path_and_parts"):
    326     # Do not unzip vector data
    327     sources = [ds._parent._path_and_parts]

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/sources/__init__.py:150, in from_source(name, lazily, *args, **kwargs)
    147     return from_source_lazily(name, *args, **kwargs)
    149 prev = None
--> 150 src = get_source(name, *args, **kwargs)
    151 while src is not prev:
    152     prev = src

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/sources/__init__.py:131, in SourceMaker.__call__(self, name, *args, **kwargs)
    128     klass = find_plugin(os.path.dirname(__file__), name, loader)
    129     self.SOURCES[name] = klass
--> 131 source = klass(*args, **kwargs)
    133 if getattr(source, "name", None) is None:
    134     source.name = name

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/core/__init__.py:22, in MetaBase.__call__(cls, *args, **kwargs)
     20 obj = cls.__new__(cls, *args, **kwargs)
     21 args, kwargs = cls.patch(obj, *args, **kwargs)
---> 22 obj.__init__(*args, **kwargs)
     23 return obj

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/sources/cds.py:126, in CdsRetriever.__init__(self, dataset, prompt, *args, **kwargs)
    123 nthreads = min(self.settings("number-of-download-threads"), len(self.requests))
    125 if nthreads < 2:
--> 126     self.path = [self._retrieve(dataset, r) for r in self.requests]
    127 else:
    128     with SoftThreadPool(nthreads=nthreads) as pool:

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/sources/cds.py:126, in <listcomp>(.0)
    123 nthreads = min(self.settings("number-of-download-threads"), len(self.requests))
    125 if nthreads < 2:
--> 126     self.path = [self._retrieve(dataset, r) for r in self.requests]
    127 else:
    128     with SoftThreadPool(nthreads=nthreads) as pool:

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/sources/cds.py:140, in CdsRetriever._retrieve(self, dataset, request)
    137     self.source_filename = cds_result.location.split("/")[-1]
    138     cds_result.download(target=target)
--> 140 return_object = self.cache_file(
    141     retrieve,
    142     (dataset, request),
    143     extension=EXTENSIONS.get(request.get("format"), ".cache"),
    144 )
    145 return return_object

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/sources/__init__.py:68, in Source.cache_file(self, create, args, **kwargs)
     65 if owner is None:
     66     owner = re.sub(r"(?!^)([A-Z]+)", r"-\1", self.__class__.__name__).lower()
---> 68 return cache_file(owner, create, args, **kwargs)

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/core/caching.py:1053, in cache_file(owner, create, args, hash_extra, extension, force, replace)
   1051 with FileLock(lock):
   1052     if not os.path.exists(path):  # Check again, another thread/process may have created the file
-> 1053         owner_data = create(path + ".tmp", args)
   1054         os.rename(path + ".tmp", path)
   1055 try:

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/earthkit/data/sources/cds.py:136, in CdsRetriever._retrieve.<locals>.retrieve(target, args)
    135 def retrieve(target, args):
--> 136     cds_result = self.client().retrieve(args[0], args[1])
    137     self.source_filename = cds_result.location.split("/")[-1]
    138     cds_result.download(target=target)

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cads_api_client/legacy_api_client.py:164, in LegacyApiClient.retrieve(self, name, request, target)
    162 submitted: Remote | Results
    163 if self.wait_until_complete:
--> 164     submitted = self.logging_decorator(self.client.submit_and_wait_on_results)(
    165         collection_id=name,
    166         **request,
    167     )
    168 else:
    169     submitted = self.logging_decorator(self.client.submit)(
    170         collection_id=name,
    171         **request,
    172     )

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cads_api_client/legacy_api_client.py:147, in LegacyApiClient.logging_decorator.<locals>.wrapper(*args, **kwargs)
    142 @functools.wraps(func)
    143 def wrapper(*args: Any, **kwargs: Any) -> Any:
    144     with LoggingContext(
    145         logger=processing.LOGGER, quiet=self.quiet, debug=self._debug
    146     ):
--> 147         return func(*args, **kwargs)

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cads_api_client/api_client.py:454, in ApiClient.submit_and_wait_on_results(self, collection_id, **request)
    438 def submit_and_wait_on_results(
    439     self, collection_id: str, **request: Any
    440 ) -> cads_api_client.Results:
    441     """Submit a request and wait for the results to be ready.
    442 
    443     Parameters
   (...)
    452     cads_api_client.Results
    453     """
--> 454     return self._retrieve_api.submit(collection_id, **request).make_results()

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cads_api_client/processing.py:711, in Processing.submit(self, collection_id, **request)
    710 def submit(self, collection_id: str, **request: Any) -> Remote:
--> 711     return self.get_process(collection_id).submit(**request)

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cads_api_client/processing.py:306, in Process.submit(self, **request)
    294 def submit(self, **request: Any) -> cads_api_client.Remote:
    295     """Submit a request.
    296 
    297     Parameters
   (...)
    304     cads_api_client.Remote
    305     """
--> 306     job = Job.from_request(
    307         "post",
    308         f"{self.url}/execution",
    309         json={"inputs": request},
    310         **self._request_kwargs,
    311     )
    312     return job.make_remote()

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cads_api_client/processing.py:147, in ApiResponse.from_request(cls, method, url, headers, session, retry_options, request_options, download_options, sleep_max, cleanup, log_messages, **kwargs)
    142 response = robust_request(
    143     method, url, headers=headers, **request_options, **kwargs
    144 )
    145 LOGGER.debug(f"REPLY {response.text}")
--> 147 cads_raise_for_status(response)
    149 self = cls(
    150     response,
    151     headers=headers,
   (...)
    157     cleanup=cleanup,
    158 )
    159 if log_messages:

File /data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/cads_api_client/processing.py:84, in cads_raise_for_status(response)
     77     else:
     78         message = "\n".join(
     79             [
     80                 f"{response.status_code} Client Error: {response.reason} for url: {response.url}",
     81                 error_json_to_message(error_json),
     82             ]
     83         )
---> 84         raise requests.HTTPError(message, response=response)
     85 response.raise_for_status()

HTTPError: 400 Client Error: Bad Request for url: https://cds.climate.copernicus.eu/api/retrieve/v1/processes/satellite-lake-water-temperature/execution
invalid request
Request has not produced a valid combination of values, please check your selection.
{'day': ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31'], 'month': '02', 'variable': 'all', 'version': '4.0', 'year': '2000'}

Anything else we need to know?

I am running on the VM as trigoso_camila. Beacuse I could not figure out what was wrong I created an account in CDS with the following email: camilatrba@gmail.com I have the .cdsapirc in my laptop and have accepted the terms and conditions to download on this account and I was able to download from the website, but I can't dowload using the c3s_eqc_automatic_quality_control library. I copied my .cdsapirc to the server using scp -r "C:\Users\Camila Trigoso.cdsapirc" wp5@136.156.129.56:/data/wp5/trigoso_camila/LakeSurfaceTemperature/data/, because initially it told me the problem was I did not have the license. But then I got error : HTTPError: 400 Client Error: Bad Request for url: https://cds.climate.copernicus.eu/api/retrieve/v1/processes/satellite-lake-water-temperature/execution invalid request Request has not produced a valid combination of values, please check your selection. {'day': ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31'], 'month': '02', 'variable': 'all', 'version': '4.0', 'year': '2000'}

Environment

malmans2 commented 1 day ago

It looks like a breaking change in the new CDS (Cristina used the legacy CDS as the new CDS has been recently announced).

The problem is that February does not have 31 days. You should be able to use larger chunks as this data is not too big, which hopefully will fix the problem.

Can you try running the same code but using chunks = {"year": 1}?

Let me know!