NVIDIA / earth2mip

Earth-2 Model Intercomparison Project (MIP) is a python framework that enables climate researchers and scientists to inter-compare AI models for weather and climate.
https://nvidia.github.io/earth2mip/
Apache License 2.0
183 stars 40 forks source link

🐛[BUG]: Download API issues #182

Open adamjstewart opened 5 months ago

adamjstewart commented 5 months ago

Version

0.1.0

On which installation method(s) does this occur?

Source

Describe the issue

The download API is a bit finicky. First, when I use earth2mip.networks.get_model(...) with a remote model name, it takes a long time to instantiate (presumably because it is downloading a model checkpoint). However, there is no progress bar (tqdm would help here). Because I didn't know what was happening, I Ctrl+C'ed the process and tried again. Every subsequent time, I get the following error:

>>> m = get_model("e2mip://pangu", device="cuda:8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/__init__.py", line 345, in get_model
    return _load_package_builtin(package, device, name=url.netloc)
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/__init__.py", line 291, in _load_package_builtin
    return inference_loader(package, device=device)
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/pangu.py", line 266, in load
    model_6 = PanguStacked(PanguWeather(p6))
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/pangu.py", line 74, in __init__
    os.stat(self.path)
FileNotFoundError: [Errno 2] No such file or directory: '/home/adam/.cache/earth2mip/models/pangu/pangu_weather_6.onnx'

Ideally, this should just redownload the model instead of crashing just because the model hasn't been downloaded. Current workaround is to delete this model directory and try again.

Environment details

Linux Ubuntu 22.04
nbren12 commented 5 months ago

Thanks for the report. I've done this myself many times.

Indeed the caching logic is not very robust right now. When writing to the cache we could write e.g. to .cache/earth2mip/models/pangu/pangu_weather_6.onnx.partial or similar, and then shutil.move to the final location. This will make the cache operations atomic, and avoid leaving it in a broken state.

And yes, a progress bar would be helpful too. Pangu is very slow to download.