ecmwf / earthkit-data

A format-agnostic Python interface for geospatial data
Apache License 2.0
57 stars 15 forks source link

Can not retrieve data from the CDS when the cache is switched off #204

Closed malmans2 closed 12 months ago

malmans2 commented 1 year ago

What happened?

I tried to use earthkit just to retrieve data from theCDS, but it fails when the cache is switched off.

What are the steps to reproduce the bug?

import earthkit.data

earthkit.data.settings.auto_save_settings = False
earthkit.data.settings.set("cache-policy", "off")

ds = earthkit.data.from_source(
    "cds",
    "reanalysis-era5-single-levels",
    variable=["2t", "msl"],
    product_type="reanalysis",
    area=[50, -10, 40, 10],  # N,W,S,E
    grid=[2, 2],
    date="2012-05-10",
    time="12:00",
)

Version

0.3.1

Platform (OS and architecture)

Darwin MacBook-Pro-3.local 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul 5 22:21:56 PDT 2023; root:xnu-8796.141.3~6/RELEASE_X86_64 x86_64

Relevant log output

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[1], line 6
      3 earthkit.data.settings.auto_save_settings = False
      4 earthkit.data.settings.set("cache-policy", "off")
----> 6 ds = earthkit.data.from_source(
      7     "cds",
      8     "reanalysis-era5-single-levels",
      9     variable=["2t", "msl"],
     10     product_type="reanalysis",
     11     area=[50, -10, 40, 10],  # N,W,S,E
     12     grid=[2, 2],
     13     date="2012-05-10",
     14     time="12:00",
     15 )

File ~/mambaforge/envs/earthkit/lib/python3.11/site-packages/earthkit/data/sources/__init__.py:143, in from_source(name, lazily, *args, **kwargs)
    140     return from_source_lazily(name, *args, **kwargs)
    142 prev = None
--> 143 src = get_source(name, *args, **kwargs)
    144 while src is not prev:
    145     prev = src

File ~/mambaforge/envs/earthkit/lib/python3.11/site-packages/earthkit/data/sources/__init__.py:124, in SourceMaker.__call__(self, name, *args, **kwargs)
    117 klass = find_plugin(os.path.dirname(__file__), name, loader)
    119 # if os.environ.get("FIEDLIST_TESTING_ENABLE_MOCKUP_SOURCE", False):
    120 #     from earthkit.data.mockup import SourceMockup
    121
    122 #     klass = SourceMockup
--> 124 source = klass(*args, **kwargs)
    126 if getattr(source, "name", None) is None:
    127     source.name = name

File ~/mambaforge/envs/earthkit/lib/python3.11/site-packages/earthkit/data/core/__init__.py:21, in MetaBase.__call__(cls, *args, **kwargs)
     19 obj = cls.__new__(cls, *args, **kwargs)
     20 args, kwargs = cls.patch(obj, *args, **kwargs)
---> 21 obj.__init__(*args, **kwargs)
     22 return obj

File ~/mambaforge/envs/earthkit/lib/python3.11/site-packages/earthkit/data/sources/cds.py:92, in CdsRetriever.__init__(self, dataset, *args, **kwargs)
     89 nthreads = min(self.settings("number-of-download-threads"), len(requests))
     91 if nthreads < 2:
---> 92     self.path = [self._retrieve(dataset, r) for r in requests]
     93 else:
     94     with SoftThreadPool(nthreads=nthreads) as pool:

File ~/mambaforge/envs/earthkit/lib/python3.11/site-packages/earthkit/data/sources/cds.py:92, in <listcomp>(.0)
     89 nthreads = min(self.settings("number-of-download-threads"), len(requests))
     91 if nthreads < 2:
---> 92     self.path = [self._retrieve(dataset, r) for r in requests]
     93 else:
     94     with SoftThreadPool(nthreads=nthreads) as pool:

File ~/mambaforge/envs/earthkit/lib/python3.11/site-packages/earthkit/data/sources/cds.py:104, in CdsRetriever._retrieve(self, dataset, request)
    101 def retrieve(target, args):
    102     self.client().retrieve(args[0], args[1], target)
--> 104 return self.cache_file(
    105     retrieve,
    106     (dataset, request),
    107     extension=EXTENSIONS.get(request.get("format"), ".cache"),
    108 )

File ~/mambaforge/envs/earthkit/lib/python3.11/site-packages/earthkit/data/sources/__init__.py:62, in Source.cache_file(self, create, args, **kwargs)
     59 if owner is None:
     60     owner = re.sub(r"(?!^)([A-Z]+)", r"-\1", self.__class__.__name__).lower()
---> 62 return cache_file(owner, create, args, **kwargs)

File ~/mambaforge/envs/earthkit/lib/python3.11/site-packages/earthkit/data/core/caching.py:872, in cache_file(owner, create, args, hash_extra, extension, force, replace)
    856 """Creates a cache file in the earthkit-data cache-directory (defined in the :py:class:`Settings`).
    857 Uses :py:func:`_register_cache_file()`
    858
   (...)
    869     Full path to the cache file.
    870 """
    871 if not CACHE.policy.has_cache() or CACHE.cache_directory() is None:
--> 872     raise RuntimeError("Cache is disabled. Cannot create cache file.")
    874 m = hashlib.sha256()
    875 m.update(owner.encode("utf-8"))

RuntimeError: Cache is disabled. Cannot create cache file.

Accompanying data

No response

Organisation

B-Open / CADS-EQC

sandorkertesz commented 1 year ago

Hi @malmans2, Thank you for reporting this issue. Actually, it is not a bug. Simply, when there is no cache there is nowhere to retrieve the data. Unfortunately, it is yet to be documented.

malmans2 commented 1 year ago

Right, I wasn't sure whether to open it as a bug or a feature request.

I have a follow up question: Are we supposed to use earthkit instead of cdsapi to only retrieve CDS raw data, or is that out of scope? I.e., if I want a single copy of CDS raw data (no cache), something like from_source("cds", ...).save(...) won't work?

sandorkertesz commented 1 year ago

I.e., if I want a single copy of CDS raw data (no cache), something like from_source("cds", ...).save(...) won't work? I think this should work. We are about to review the caching and try to support this scenario.

sandorkertesz commented 1 year ago

This issue is addressed in #246