NVIDIA / earth2mip

Earth-2 Model Intercomparison Project (MIP) is a python framework that enables climate researchers and scientists to inter-compare AI models for weather and climate.
https://nvidia.github.io/earth2mip/
Apache License 2.0
183 stars 40 forks source link

🐛[BUG]: Unable to run inference_ensemble for models other than fcnv2_sm #192

Open david5010 opened 1 month ago

david5010 commented 1 month ago

Version

main

On which installation method(s) does this occur?

Source

Describe the issue

I've followed the example provided here. However, it seems to only work with fcnv2_sm. I tried to run DLWP, GraphCast, Pangu which caused different issues.

DLWP: RuntimeError: MKL FFT error: Intel MKL DFTI ERROR: Inconsistent configuration parameters occuring at ensemble_utils.py line 171

GraphCast: Missing metadata.json (unsure how to download the relevant package)

Pangu: This one might be working but I have issues with onnxruntime finding my GPU

Environment details

git clone
pip install .
nbren12 commented 1 month ago

Thanks for the report. It seems like there are a few issues here. I'd guess part of it is related to your installation since it seems like you are not using the gpu for either pangu or dlwp inference.

It also may be easier to start with one of the simpler examples e.g: https://nvidia.github.io/earth2mip/examples/02_model_comparison.html. I'm not sure we've used ensemble inference with DLWP.

I could probably help more if you provide full error messages.

david5010 commented 1 month ago

For Pangu-Weather, I think the model is working although my installation might have some issues with onnxruntime. As for DLWP, here's the error message. It's different when I use GFS and IFS as datasource:

With IFS initial conditions image

Config: config = { "ensemble_members": args.members, "noise_amplitude": 0.05, "simulation_length": args.lead_time, "weather_event": { "properties": { "name": "Globe", "start_time": formatted_date, "initial_condition_source": 'ifs', }, "domains": [ { "name": "global", "type": "Window", "diagnostics": [ { "type": "raw", "channels": [ "t2m", "u10m", "v10m" ], } ], } ], },

TODO: Format so that it goes into {YYYYMMDD}.t{HH-00,06,12,18}z/{init-ifs or init-gfs}/

        "output_path": f"{args.output_dir}/{args.date.strftime('%Y%m%d.t%Hz')}/ecmwfdlwp",
        "output_frequency": 1,
        "weather_model": "dlwp",
        "seed": 12345,
        "use_cuda_graphs": False,
        "ensemble_batch_size": 1,
        "autocast_fp16": False,
        "perturbation_strategy": "correlated",
        "noise_reddening": 2.0
    }

With GFS: image

Config: config = { "ensemble_members": args.members, "noise_amplitude": 0.05, "simulation_length": args.lead_time, "weather_event": { "properties": { "name": "Globe", "start_time": formatted_date, "initial_condition_source": 'gfs', }, "domains": [ { "name": "global", "type": "Window", "diagnostics": [ { "type": "raw", "channels": [ "t2m", "u10m", "v10m" ], } ], } ], },

TODO: Format so that it goes into {YYYYMMDD}.t{HH-00,06,12,18}z/{init-ifs or init-gfs}/

        "output_path": f"{args.output_dir}/{args.date.strftime('%Y%m%d.t%Hz')}/ecmwfdlwp",
        "output_frequency": 1,
        "weather_model": "dlwp",
        "seed": 12345,
        "use_cuda_graphs": False,
        "ensemble_batch_size": 1,
        "autocast_fp16": False,
        "perturbation_strategy": "correlated",
        "noise_reddening": 2.0
    }

    As for graphcast_operational, it says that I'm missing the metadata.json. I'm unsure how to get that even after I ran pip install .[graphcast].

I hope this information help and please let me know what other error messages I can provide!

ndp99VN commented 1 month ago

Hello I got the same error and I also tried to run with basic inference. When I import graphcast with import earth2mip.networks.graphcast as graphcast, it cannot find module named haiku for me.

david5010 commented 1 month ago

Hello I got the same error and I also tried to run with basic inference. When I import graphcast with import earth2mip.networks.graphcast as graphcast, it cannot find module named haiku for me.

For this, try to run pip install .[graphcast] at the root of the repo, and also do pip install -r requirements.txt.

Where you able to register the graphcast model? In .cach/, i only see fcnv2, pangu, dlwp, but not graphcast

ndp99VN commented 1 month ago

May I have your email so that I can contact with you for details ? I'm trying to run also DLWP, pangu and graphcast model but it seems that I got different errors with different models.