JuliaClimate / ClimateTools.jl

Climate science package for Julia
https://juliaclimate.github.io/ClimateTools.jl/stable/
Other
116 stars 18 forks source link

difficulty reading CF compliant files #135

Open gaelforget opened 4 years ago

gaelforget commented 4 years ago

After loading one of my files via Panoply to verify that there was nothing wrong with it (see below) I tried the model = load(gcm_files, "tasmax", poly=poly_reg) example and got ERROR: Manually verify x/lat dimension name.

Taking a look in the code I see that getdim_lat relies on a list of hard coded names. I thought that the more general approach was to rely on long_name + units. Not sure what to suggest -- adding to the hard coding list would be a short term fix just for me...

  lon_c   (720)
    Datatype:    Float64
    Dimensions:  lon_c
    Attributes:
     units                = degrees_east
     long_name            = longitude
Screen Shot 2020-02-28 at 4 03 53 PM
gaelforget commented 4 years ago

Also, the next file I am planning to present to climatetools is also CF-compliant but not on a regular lat-lon grid (see below). But I am going to wait a bit before I try that.

Screen Shot 2020-02-28 at 4 18 42 PM
Balinus commented 4 years ago

Thanks for the input! Indeed, this is certainly not an elegant function. From memory, this was coded for a project that involved regional climate models (your second case).

Not sure if the extraction of lon_c based on long_name is robust though. Seems more robust to go with the detected dimensions. For instance, for a regional climate model, the dimension will not have longitude as their dimension. They will have a longitude grid though, with the long_name being longitude. If I rely on detecting say longitude, we will extract the longitude grid and not the native dimension which could be meters, degrees on a stereographic grid, etc...

Open to suggestions though as hardcoding this is not a robust solution either.

gaelforget commented 4 years ago

Open to suggestions though as hardcoding this is not a robust solution either.

Cool. Will take a deeper look and might send PR later if I find a way to improve code

regional climate models (your second case)

Just to clarify, I use sets of these files that collectively add up to global model variables

Balinus commented 4 years ago

Just to clarify, I use sets of these files that collectively add up to global model variables

You mean likes "tiles" ?

lmilechin commented 4 years ago

Just for reference: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#latitude-coordinate

From what I've seen with other tools, they detect dimensions using the units, which is what the CF Conventions seems to imply as well.

Balinus commented 4 years ago

Thanks! I've seen that in RCMs, latitude and longitude grid have also an official standard_name. Hence, this should be possible to discern dimensions and coordinates adequately.

I'm gonna rework this extraction part asap.

gaelforget commented 4 years ago

Thanks! I've seen that in RCMs, latitude and longitude grid have also an official standard_name. Hence, this should be possible to discern dimensions and coordinates adequately.

As highlighted by @lmilechin it is the units attribute that should be used to identify coordinates per the CF guidelines -- as opposed to standard_name which is only optional and e.g. does not distinguish between different longitude conventions

I'm gonna rework this extraction part asap.

Great! Thanks

Balinus commented 4 years ago

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

gaelforget commented 4 years ago

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

How about using the files I mentioned at the top of this thread?

These get generated by running 04_netcdf.ipynb from GlobalOceanNotebooks :

outputs/nctiles-newfiles/interp/ETAN.nc
outputs/nctiles-newfiles/tiled/ETAN/ETAN.*.nc

ps. I just reran the notebook in binder & regenerated these without problem

gaelforget commented 4 years ago

Just to clarify, I use sets of these files that collectively add up to global model variables

You mean likes "tiles" ?

Yes -- one tile = 1 file in this example

Balinus commented 4 years ago

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

How about using the files I mentioned at the top of this thread?

These get generated by running 04_netcdf.ipynb from GlobalOceanNotebooks :

outputs/nctiles-newfiles/interp/ETAN.nc
outputs/nctiles-newfiles/tiled/ETAN/ETAN.*.nc

ps. I just reran the notebook in binder & regenerated these without problem

Thanks, I was able to produce the files at home.

Balinus commented 4 years ago

Also, re-read the thread and wanted to clarify: when I spoke about "dimension" I was mostly referring to the dimensions of the datasets, not the units/measure of the variable itself. Hence, the need to distinguish between a rotated latitude "dimension" versus the latitude grid (a variable in the dataset, not the one of the dimension) of a datasets for projected grids.

Anyway, I'll be forced to think about a more general solution to this!

edit - For example, for this dataset, there is rlat and rlon.

Dimensions
   rlat = 412
   rlon = 424
   time = 2920
   bnds = 2

Variables
  lat   (424 × 412)
    Datatype:    Float64
    Dimensions:  rlon × rlat
    Attributes:
     standard_name        = latitude
     long_name            = latitude
     units                = degrees_north

  lon   (424 × 412)
    Datatype:    Float64
    Dimensions:  rlon × rlat
    Attributes:
     standard_name        = longitude
     long_name            = longitude
     units                = degrees_east

  pr   (424 × 412 × 2920)
    Datatype:    Float32
    Dimensions:  rlon × rlat × time
    Attributes:
     grid_mapping         = rotated_pole
     _FillValue           = 1.0e20
     missing_value        = 1.0e20
     standard_name        = precipitation_flux
     long_name            = Precipitation
     units                = kg m-2 s-1
     coordinates          = lon lat
     cell_methods         = time: mean

  rlat   (412)
    Datatype:    Float64
    Dimensions:  rlat
    Attributes:
     standard_name        = grid_latitude
     long_name            = latitude in rotated pole grid
     units                = degrees
     axis                 = Y

  rlon   (424)
    Datatype:    Float64
    Dimensions:  rlon
    Attributes:
     standard_name        = grid_longitude
     long_name            = longitude in rotated pole grid
     units                = degrees
     axis                 = X
Balinus commented 4 years ago

I've sketched some code in #137

It's pretty rough right now but so far it works. Just not sure about the robustness though. Haven't had the time to test your files @gaelforget but I'm pretty sure it does not work. I'm currently testing for axis (optional attribute in CF files) and standard_name attributes of the dimensions. Will add long_name later.

Balinus commented 4 years ago

@gaelforget In the files produced by the Notebook, both lat_c and lon_c has a longitude attribute as their long_name.