HFAiLab / OpenCastKit

The open-source solutions of FourCastNet and GraphCast
MIT License
297 stars 74 forks source link

cdsapi: ERA5 total precipitation accumulation #23

Open Aquila96 opened 1 year ago

Aquila96 commented 1 year ago

In data_factory/download_era5.py, it downloads 6-hour total precipitation by specifying 6 hour intevals.

But after reviewing the data and reading the ERA5 cdsapi demo, It seems that the api only downloads 1-hour accumlated precipitation, for multi-hour accumulated precipitation, one have to download all the previous hours and perform accumulation afterwards.

daniellucs2002 commented 7 months ago

Yeah, I agree with you. Also, in the GraphCast paper, the authors also mentioned that the precipitation data needs to be accumulated from the past six hours. So, have you tried to modify the code or figured out how to prepare the training data at least? (I plan to train the model from scratch if the provided weights file cannot be utilized)

Aquila96 commented 7 months ago

Yeah, I agree with you. Also, in the GraphCast paper, the authors also mentioned that the precipitation data needs to be accumulated from the past six hours. So, have you tried to modify the code or figured out how to prepare the training data at least? (I plan to train the model from scratch if the provided weights file cannot be utilized)

cdsapi is rather straight forward to use, you can refer to NVIDIA's download demo here and here. For precipitation specifically, I just fetch data from all 6 previous hours and add them together, similar to the approach in the ECMWF demo.

daniellucs2002 commented 7 months ago

Thanks for answering! I remember you mentioned that the inconsistency in model architecture between huggingface checkpoint and this repository in another issue post. Have you addressed that problem and run the project on the GPU cluster?

Aquila96 commented 7 months ago

Thanks for answering! I remember you mentioned that the inconsistency in model architecture between huggingface checkpoint and this repository in another issue post. Have you addressed that problem and run the project on the GPU cluster?

No problem. Regarding the model weights, I wouldn't recommend to try to modify the source code to match the weights provided. For FourCastNet, we have compared results both from this version and the official version from NVIDIA, and the results from this version is worse, though I couldn't remember by how large of a margin. For GraphCast, deepmind have since released their implementation and weights in jax, and the results are mostly on par with their claimed state-of-the-art performance. You can also read the implementation in modulus for the pytorch.