NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
120 stars 19 forks source link

Unable to specify dimensions after reading ERA5 data with cfpython #803

Closed decadeneo closed 4 weeks ago

decadeneo commented 1 month ago

cf-python version: 3.16.0

I'm working with ERA5 data that I've read using cfpython. The data dimensions are as follows:

field = CF Field: specific_humidity(long_name=time(1), long_name=pressure_level(8), long_name=latitude(721), long_name=longitude(1440)) kg kg**-1 When I try to perform a mean collapse along the 'longitude' dimension using the following command:

f[0].collapse('mean', 'longitude') I encounter the following error:

ValueError: Can't find the collapse axis identified by 'longitude' Could anyone provide some guidance on what might be causing this issue and how to resolve it?

sadielbartholomew commented 1 month ago

Hi @decadeneo , thanks for raising the Issue. This is a perfectly valid and descriptive error report though, so I don't think it is a bug, unless we can see otherwise from further information. I think it relates to lack of CF Conventions compliance in your data. But let's see - please can you run the following and share with us what you get:

Once we see those we can explain why you are seeing this error. Thanks.

decadeneo commented 1 month ago

thank u for reply

here is the part of f.dump


----------------------------------
Field: specific_humidity (ncvar%q)
----------------------------------
Conventions = 'CF-1.6'
_FillValue = -32767
history = '2023-11-12 14:12:12 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-
           client/bin/grib_to_netcdf.bin -S param -o /cache/data7/adaptor.mars.
           internal-1699798331.9119155-4508-6-3ce7335b-ec6a-4467-b5d7-a59962491
           034.nc /cache/tmp/3ce7335b-ec6a-4467-b5d7-a59962491034-adaptor.mars.
           internal-1699798330.2697108-4508-8-tmp.grib'
long_name = 'Specific humidity'
missing_value = -32767
standard_name = 'specific_humidity'
units = 'kg kg**-1'

and environment

Platform: Linux-4.14.105-19-0024-x86_64-with-glibc2.31
HDF5 library: 1.12.2
netcdf library: 4.9.3-development
udunits2 library: /opt/conda/lib/libudunits2.so.0
esmpy/ESMF: 8.4.1
Python: 3.9.12
dask: 2024.2.0
netCDF4: 1.6.5
psutil: 5.9.0
packaging: 23.2
numpy: 1.26.4
scipy: 1.11.4
matplotlib: 3.8.3
cftime: 1.6.3
cfunits: 3.3.6
cfplot: 3.3.0
cfdm: 1.11.0.0
cf: 3.16.0
sadielbartholomew commented 1 month ago

@decadeneo thanks, though we need to see the whole of the f.dump() and not just the header to know what is going on...

decadeneo commented 1 month ago

sorry ,here is all

----------------------------------
Field: specific_humidity (ncvar%q)
----------------------------------
Conventions = 'CF-1.6'
_FillValue = -32767
history = '2023-11-12 14:12:12 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-
           client/bin/grib_to_netcdf.bin -S param -o /cache/data7/adaptor.mars.
           internal-1699798331.9119155-4508-6-3ce7335b-ec6a-4467-b5d7-a59962491
           034.nc /cache/tmp/3ce7335b-ec6a-4467-b5d7-a59962491034-adaptor.mars.
           internal-1699798330.2697108-4508-8-tmp.grib'
long_name = 'Specific humidity'
missing_value = -32767
standard_name = 'specific_humidity'
units = 'kg kg**-1'

Data(long_name=time(1), long_name=pressure_level(8), long_name=latitude(721), long_name=longitude(1440)) = [[[[5.29922264794723e-06, ..., 9.284544728768718e-05]]]] kg kg**-1

Domain Axis: long_name=latitude(721)
Domain Axis: long_name=longitude(1440)
Domain Axis: long_name=pressure_level(8)
Domain Axis: long_name=time(1)

Dimension coordinate: long_name=time
    calendar = 'gregorian'
    long_name = 'time'
    units = 'hours since 1900-01-01 00:00:00.0'
    Data(long_name=time(1)) = [2023-11-07 12:00:00] gregorian

Dimension coordinate: long_name=pressure_level
    long_name = 'pressure_level'
    units = 'millibars'
    Data(long_name=pressure_level(8)) = [200, ..., 1000] millibars

Dimension coordinate: long_name=latitude
    long_name = 'latitude'
    units = 'degrees_north'
    Data(long_name=latitude(721)) = [90.0, ..., -90.0] degrees_north

Dimension coordinate: long_name=longitude
    long_name = 'longitude'
    units = 'degrees_east'
    Data(long_name=longitude(1440)) = [0.0, ..., 359.75] degrees_east
sadielbartholomew commented 1 month ago

Hi, OK I see the issue here. Since your dimension coordinates don't have standard names assigned, only long names, you need to specify that you are referring to the long name via the following, instead of f[0].collapse('mean', 'longitude'):

f[0].collapse('mean', 'long_name=longitude')

I am removing the 'bug' flag since the error was perfectly reasonable and trying to tell you that 'longitude' was not recognised, and unless you have that precise name set as a standard name then it will not be and you will need to include the long_name= part. To include such a standard name, to make your dataset more CF compliant, you can add one via:

f.construct("long_name=longitude").standard_name = "longitude"

and you can do similarly for the longitude, time and pressure level constructs. Then you can call the collapse exactly as you tried, and you won't get the error.

Does that make sense?

sadielbartholomew commented 1 month ago

I should add, the following section of the documentation should help regarding identifiers to use to: https://ncas-cms.github.io/cf-python/tutorial.html?highlight=long_name#field-identities.

decadeneo commented 4 weeks ago

thank you! it's usefule