NVIDIA / earth2studio

Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.
https://nvidia.github.io/earth2studio/
Apache License 2.0
73 stars 23 forks source link

🐛[BUG]: Data from ARCO and WB2 is not cached #96

Closed mariusaurus closed 1 month ago

mariusaurus commented 1 month ago

Version

0.3.0a0

On which installation method(s) does this occur?

Source

Describe the issue

Data downloaded from ARCO and from WB2 is not cached. The respective folders ~/.cache/earth2studio/{arco,wb2} are created but remain empty. Below, a minimal example to reproduce the behaviour. For the run, e2studio was installed from source inside a modulus 24:04 container.

from earth2studio.data import ARCO, WB2Climatology, fetch_data
from numpy import datetime64, timedelta64

for data in (ARCO(cache=True), WB2Climatology(cache=True)):
    xx, meta = fetch_data(
        source=data,
        time=[datetime64('2023-01-01')],
        variable=['t2m'],
        lead_time=[0, (timedelta64(6, 'h')).astype('timedelta64[ns]')],
    )

    print(f'{xx.shape=}')
    print(f'{meta.keys()=}')
NickGeneva commented 1 month ago

GCFS doesn't seem to have too much native caching ability. Will likely need to wrap these in a CachingFilesystem to work properly.