Code not working with CMIP5/EUROCORDEX data

zazass8 commented 3 months ago

I have been using your Git repository "climatereconstructionAI" in reconstructing low-resolution images from CMIP5 simulations containing missing values. Specifically, these values is a "hole" in the middle of the dataset representing the inexistence of the inner domain. Hence the dataset only includes the "buffer zone" of the data with the "hole" in the middle. I want to reconstruct these images in completed low-resolution images that match the EUROCORDEX simulations in low-resolution.

I have already split the dataset in training, validation and test set storing them in the corresponding directories. In addition I generated the masks for each dataset and stored them again in the corresponding directories. As you also mention on the paper that you try to remap the data into a 5/2.5 deg. equally shaped grid, I tried to do something similar according to the coordinate scales I have for the my dataset by modifying the degrees on the dataset_format.json file. Within the file, I also modified the lat/lon scales given to the ones of my dataset. By setting the corresponding arguments within the given arguments list, such as the data directories, data types, number of encoding/decoding layers and hyperparameters such as learning rate and batch size (I added more than these).

Following both the instructions on your repository and the publication, it seems that the code is not compatible with the NETCDF files I provide it with. A lot of errors have been raised such as, "should expect a copy of train/val/test sets in each data directory, and vice versa" (which I already adjusted) or "The latitude grid is not uniform" or a lot related to the source code itself. I tried to debug the code to make it compatible with my data but still I was not able to find a solution.

If possible, I would need some guidance related to what I am doing wrong. I haven't seen any people yet posting anything to the issue tracker related to issues of using the issue so it should be intuitive to work properly even with my datasets.

F.Y.I Neither the CMIP5, nor the EUROCORDEX images are not equally shaped grids, as mentioned in the paper, and are also mismatched. I tried already remapping the CMIP5 dataset to a square grid but still wasn't working.

When testing the code, it showed several times error messages related to that the training and validation sets after split are not equally shared.

I saw from your dataset_format.json file that you've given as key value the name of the dataset you used (eg. HadCRUT4). I didn't really understand the difference between "hadcrut" and "hadcrut-mod" that you used for keys in the data structure and I am not entirely sure what I should be using for my data.

On the arguments list, there's no really argument on where to store the target data apart from the number of data-names to be used as target data.

eplesiat commented 3 months ago

Dear zazass8, thank you for your feedback.

You are mentioning multiple issues that may require a closer look at your NetCDF files. To start with, I can try to give some hints regarding two points you mentionned:

the spatial grid of your data does not need to be square. However, it should have a regular spacing in both longitudinal and latitudinal directions. If this is not the case, I would recommend to regrid your NetCDF files before using CRAI
the file _datasetformat.json is not required in your case. You can simply provide the filenames of the NetCDF files stored in your "train" and "val" folders using the --data-names option. The corresponding data will be used as input (masked version) and as target data (complete original data).

I hope I could clarify some of your doubts. Please, let us know if you need further assistance.

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 30 days with no activity.

FREVA-CLINT / climatereconstructionAI

Code not working with CMIP5/EUROCORDEX data #29