NVIDIA / earth2mip

Earth-2 Model Intercomparison Project (MIP) is a python framework that enables climate researchers and scientists to inter-compare AI models for weather and climate.
https://nvidia.github.io/earth2mip/
Apache License 2.0
183 stars 40 forks source link

🐛[BUG]: Not able to load models on CPU #183

Open adamjstewart opened 5 months ago

adamjstewart commented 5 months ago

Version

0.1.0

On which installation method(s) does this occur?

Source

Describe the issue

The default behavior of earth2mip.networks.get_model(...) is to load the model on the CPU. However, the default behavior does not work:

>>> from earth2mip.networks import get_model
>>> m = get_model("e2mip://dlwp")
/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-ecmwflibs-0.6.1-ir2ftnwp2o644qfwe3pxs46i7xznemce/lib/python3.10/site-packages/ecmwflibs/__init__.py:144: UserWarning: ecmwflibs: using provided 'ECMWFLIBS_ECCODES' set to '/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/eccodes-2.34.0-5mcekl7wxd72hrvaq7pmv4lw7cyoqoou/lib/libeccodes.so
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/__init__.py", line 345, in get_model
    return _load_package_builtin(package, device, name=url.netloc)
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/__init__.py", line 291, in _load_package_builtin
    return inference_loader(package, device=device)
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/dlwp.py", line 155, in load
    with torch.cuda.device(device):
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-torch-2.2.1-yni5thh74mlihxkm4dm7ap5wtafhornn/lib/python3.10/site-packages/torch/cuda/__init__.py", line 370, in __init__
    self.idx = _get_device_index(device, optional=True)
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-torch-2.2.1-yni5thh74mlihxkm4dm7ap5wtafhornn/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index
    raise ValueError(f"Expected a cuda device, but got: {device}")
ValueError: Expected a cuda device, but got: cpu

It seems like the models are hard-coded to be loaded on CUDA, even if no CUDA device is available (CPU, ROCm, MPS, etc.).

Environment details

Linux Ubuntu 22.04
nbren12 commented 5 months ago

Thanks. I agree this seems like a bug in the dlwp loader.

I'm not sure if it is broken for other models.

cc @ktangsali @NickGeneva