RolnickLab / ClimateSet

A Large-Scale Climate Model Dataset for Machine Learning
GNU General Public License v3.0
20 stars 6 forks source link

Missing climate models and ensemble members #13

Open jirvin16 opened 2 months ago

jirvin16 commented 2 months ago

Thank you for putting together an amazing dataset for the AI+climate community!

It looks like the dataset hosted on huggingface is missing several files. It only seems to have 21 climate models (rather than 36 stated in the paper) and from the included climate models, several ensemble members seem to be missing (e.g. CAMS-CSM1-0 only has 1 but the paper states it has 2). I believe several scenarios are missing as well.

Would it be possible to upload the missing data, or was their exclusion intentional?

Thanks again.

liellnima commented 2 months ago

Hi Jeremy,

I am happy you find ClimateSet helpful!

Yes, the dataset is indeed missing several files (and having some issues here and there still). To separate the issues:

In summary: The exclusion is intentional, however, we would like to add the missing data.

We are currently working on re-doing the whole ClimateSet pipeline and hope to be able to provide a ClimateSet python package that includes the full dataset and a smooth pipeline by the end of this year (2024). Unfortunately, the folks working on this (including me) are doing this as a side thing and have all other main tasks / research projects keeping us occupied.

I think that our new approach will help us to have a cleaner setup / dataset and track down the issues of the currently missing datasets :)

If you want to contribute and accelerate things, please let me know - I am super happy to include anyone who has time for this :)