ecmwf / anemoi-datasets

Apache License 2.0
31 stars 14 forks source link

Ability to create datasets when coordinates may not be neatly named "latitude" and "longitude" #63

Open mariahpope opened 2 weeks ago

mariahpope commented 2 weeks ago

Is your feature request related to a problem? Please describe.

The class DefaultCoordinateGuesser(CoordinateGuesser) in the src/anemoi/datasets/create/functions/sources/xarray/flavour.py script currently only has the ability to work with datasets where the coordinate names are included in one of the if statements. There is an error message that the coordinate is not supported if it is not included in one of the if statements. An easy work around (in the meantime) to get this to work is to just go in an add my own if statement for whatever the name of my coordinate is (assuming it maybe is not named a neat "latitude" or "level").

Describe the solution you'd like

Some functionality to define the name of our lat, lon, level, etc. coordinate in the dataset we are loading? Or a more expansive list of if statements that include more possibilities coordinates could be named?

Overall, a way to not fail if my coordinate name is not currently included.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

b8raoult commented 2 weeks ago

Hi, can you try providing a dictionary as in that test: https://github.com/ecmwf/anemoi-datasets/blob/3c8a1dea4f7033cd6c3fafd60ffe036b6e561966/tests/xarray/test_zarr.py#L71

b8raoult commented 2 weeks ago

This is not yet documented as it is work in progress. Could you also provide a sample file or url?

mariahpope commented 1 week ago

Hello!

Thanks for getting back to me.

Sample file: url: "gs://noaa-ufs-gefsv13replay/ufs-hr1/0.25-degree-subsampled/03h-freq/zarr/fv3.zarr" Dictionary of coordinates:

      flavour: {
        "rules": {
            "latitude": {"name": "grid_yt"},
            "longitude": {"name": "grid_xt"},
            "time": {"name": "time"},
            "level": {"name": "pfull"},
        },
        "levtype": "pl",
    }

I was able to create a unit test that passed with that dictionary and was also able to add it to my yaml and create a dataset. However, you will notice that there are two extra coordinates in this dataset that are actually not necessary for our purposes here, "cftime" and "ftime". A warning comes up that says those coordinates are not supported.. but things appear to keep running. If I leave those coordinates out of the dictionary in my yaml should everything still run and just drop them?