Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
754 stars 262 forks source link

Access Enum in NC4 file #1228

Closed Marston closed 1 year ago

Marston commented 1 year ago

I'm trying to access an enum using the python module netcdf4. This enum is written in a netcdf4 file. I can see the enum as a type but I cannot access it. Here's a ncdump:

group: wem {
  types:
    ubyte enum features_3d {Temperature = 0,
        Pseudo_adiabatic_potential_temperature = 1,
        Dew_point_temperature = 2, Specific_humidity = 3,
        Relative_humidity = 4, Wind_direction = 5, Wind_speed = 6,
        Density = 7, Potential_temperature = 8, Humidity_mixing_ratio = 9,
        Geopotential_height = 10, Absolute_vorticity = 11,
        Relative_vorticity = 12, Relative_divergence = 13,
        Ice_water_mixing_ratio = 14, Density_Altitude = 15,
        Height_D-values = 16, U_wind = 17, V_wind = 18,
        Cloud_mixing_ratio = 19, Rain_water_mixing_ratio = 20} ;
    ubyte enum features_2d {Temperature_2m = 0, Temperature_tropopause = 1,
        Latent_heat_flux = 2, Sensible_heat_flux = 3,
        Surface_Skin_Temperature = 4, Precipitable_water = 5,
        Absolute_Humidity = 6, Maximum_absolute_humidity = 7,
        Horizontal_moisture_convergence = 8, Potential_evaporation_rate = 9,
        Vertical_speed_shear = 10, Pressure_at_the_tropopause = 11,
        Pressure_reduced_to_MSL = 12, Geopotential_height_fzglvl = 13,
        Geopotential_height_tro = 14, Model_terrain_height = 15} ;
    ubyte enum msllevels_t {\0 = 0, \1000 = 1, \3000 = 2, \6000 = 3,
        \9000 = 4, \12000 = 5, \15000 = 6, \18000 = 7, \21000 = 8,
        \24000 = 9, \27000 = 10, \30000 = 11, \33000 = 12, \36000 = 13,
        \39000 = 14, \41000 = 15, \45000 = 16, \48000 = 17, \51000 = 18,
        \54000 = 19, \57000 = 20, \60000 = 21, \65000 = 22, \70000 = 23,
        \75000 = 24, \80000 = 25, \85000 = 26, \90000 = 27, \95000 = 28,
        \100000 = 29, \110000 = 30} ;
  dimensions:
        time = UNLIMITED ; // (1 currently)
        lat = 721 ;
        lon = 1440 ;
        msllevels = 31 ;
        features3d = 21 ;
        features2d = 16 ;
  variables:
        double time(time) ;
                time:units = "seconds since 1990-01-01 00:00:00" ;
                time:long_name = "time" ;
                time:axis = "T" ;
                time:calendar = "Standard" ;
        float lat(lat) ;
                lat:units = "degrees_north" ;
                lat:long_name = "latitude" ;
                lat:axis = "Y" ;
        float lon(lon) ;
                lon:units = "degrees_east" ;
                lon:long_name = "longitude" ;
                lon:axis = "X" ;
        int msllevels(msllevels) ;
        int features3d(features3d) ;
        int features2d(features2d) ;
        double variables3d(time, features3d, msllevels, lat, lon) ;
        double variables2d(time, features2d, lat, lon) ;
  } // group wem
}

But I cannot see it in the python variable:

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): 
    variables(dimensions): 
    groups: wem

How do I access the enums? Was it written in the file incorrectly?

jswhit commented 1 year ago

There is an enum datatype defined in the file, but no variables created with that type. You can access the enum type with

nc.enumtypes['msllevels_t']

where nc is the Dataset object.

Marston commented 1 year ago

Thanks for the access code. But how can I create the file using the enum? I thought that wasusing the enum but clearly, I'm missing the mark. Could you please show me an example of how to create variables using the enums?

Marston commented 1 year ago

I'm unable to get that enum command to work: nc.enumtypes['msllevels_t']. It says that nc has no method called enumtypes.

dirs(nc)

['CompoundType', 'Dataset', 'Dimension', 'EnumType', 'Group', 'MFDataset', 'MFTime', 'NC_DISKLESS', 'NC_PERSIST', 'VLType', 'Variable', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__has_cdf5_format__', '__has_nc_create_mem__', '__has_nc_inq_format_extended__', '__has_nc_inq_path__', '__has_nc_open_mem__', '__has_parallel4_support__', '__has_pnetcdf_support__', '__has_rename_grp__', '__hdf5libversion__', '__loader__', '__name__', '__netcdf4libversion__', '__package__', '__path__', '__spec__', '__version__', '_netCDF4', 'chartostring', 'date2index', 'date2num', 'default_encoding', 
'default_fillvals', 'get_chunk_cache', 'getlibversion', 'glob', 'is_native_big', 'is_native_little', 'ma', 'num2date', 'numpy', 'pathlib', 'posixpath', 'set_chunk_cache', 'stringtoarr', 'stringtochar', 'subprocess', 'sys', 'unicode_error', 'utils', 'warnings', 'weakref']

The EnumType that is there is the constructor so this is not what I'm looking for.

jswhit commented 1 year ago

There is an example in the docs at http://unidata.github.io/netcdf4-python/#enum-data-type

Marston commented 1 year ago

I see how to create it. Thanks. But why doesn't your example work? Is it because the ncfile was not made correctly?

jswhit commented 1 year ago

here's a self contained example

import numpy as np
# Enum type example.
f = Dataset('clouds.nc','w')
# python dict describing the allowed values and their names.
enum_dict = {'Altocumulus': 7, 'Missing': 255, 'Stratus': 2, 'Clear': 0,
'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5, 'Cumulonimbus': 1,
'Stratocumulus': 3}
# create the Enum type called 'cloud_t'.
cloud_type = f.createEnumType(np.uint8,'cloud_t',enum_dict)
print(cloud_type)
time = f.createDimension('time',None)
# create a 1d variable of type 'cloud_type' called 'primary_clouds'.
# The fill_value is set to the 'Missing' named value.
cloud_var = f.createVariable('primary_cloud',cloud_type,'time',\
fill_value=enum_dict['Missing'])
# write some data to the variable.
cloud_var[:] = [enum_dict['Clear'],enum_dict['Stratus'],enum_dict['Cumulus'],\
                enum_dict['Missing'],enum_dict['Cumulonimbus']]
# close file, reopen it.
f.close()
f = Dataset('clouds.nc')
print(f.enumtypes)
cloud_var = f.variables['primary_cloud']
print(cloud_var)
print(cloud_var.datatype.enum_dict)
print(cloud_var[:])
f.close()
jswhit commented 1 year ago

Looks like your enum types are defined inside the wem group, so you would have to do

nc['wem'].enumtypes
Marston commented 1 year ago

I created this simple test of what I did and trying to use the enums in variable creation, but I keep getting this error:

import netCDF4 as nc
import numpy as np

ncfile = nc.Dataset('test.nc', mode='w', format='NETCDF4') 
rootgrp = ncfile.createGroup('wem')

dimNames = []
dimsnc = {}
varnc = dict()
dimsAttrs = {'lat': {'attr': {'units': 'degrees_north', 'long_name': 'latitude', 'axis': 'Y'}, 'dtype': 'f4'},
                     'lon': {'attr': {'units': 'degrees_east', 'long_name': 'longitude', 'axis': 'X'}, 'dtype': 'f4'},
                     'plevels': {'attr': {'units': 'Pa', 'long_name': 'isobaric pressure levels', 'axis': 'Z'}, 'dtype': 'f4'},
                     'msllevels': {'attr': {'units': 'ft', 'long_name': 'height in msl', 'axis': 'Z'}, 'dtype': 'f4'},                     
                     'time': {'attr': {'units': 'seconds since 1990-01-01 00:00:00', 'long_name': 'time', 
                                       'axis': 'T', 'calendar': 'Standard'}, 'dtype': 'f8'},
                     'features2d': {'attr': {'units':'-', 'long_name': 'Names of GALWEM 2D variables mapped to indices'}},
                     'features3d': {'attr': {'units':'-', 'long_name': 'Names of GALWEM 3D variables mapped to indices'}}
                    }
print(f'Building the feature dictionaries...')
feat_dict3d = {'Temperature': 0, 'Pseudo_adiabatic_potential_temperature': 1}
feat_dict2d = { 'Temperature_2m': 0, 'Temperature_tropopause': 1 }
print(f'Creating the dimensions for nc4 file...')                                    
nlat = 5
nlon = 5
nlev = 5
lev_dict = {'0': 0, '1000': 1, '3000': 2, '6000': 3, '9000': 4}

features3d = rootgrp.createEnumType(np.uint8,'features3d_t',feat_dict3d)
features2d = rootgrp.createEnumType(np.uint8,'features2d_t',feat_dict2d)
msllevels = rootgrp.createEnumType(np.uint8,'msllevelset_t', lev_dict)

rootgrp.createDimension('time', None)
rootgrp.createDimension('lat', nlat)
rootgrp.createDimension('lon', nlon)

timedimnc = rootgrp.createVariable('time', 'f8', ('time',))
for attr, name in dimsAttrs['time']['attr'].items():
    setattr(timedimnc, attr, name)
latdimnc = rootgrp.createVariable('lat', 'f4', ('lat',))
for attr, name in dimsAttrs['lat']['attr'].items():
    setattr(latdimnc, attr, name)
londimnc = rootgrp.createVariable('lon', 'f4', ('lon',))
for attr, name in dimsAttrs['lon']['attr'].items():
    setattr(londimnc, attr, name)
del attr, name

var3dnc = rootgrp.createVariable('variables3d', 'f8', ('time',features3d,msllevels,'lat','lon'))
var2dnc = rootgrp.createVariable('variables2d', 'f8', ('time',features2d,'lat','lon'))

ncfile.close()
Exception has occurred: AttributeError
'netCDF4._netCDF4.EnumType' object has no attribute '_dimid'
  File "test_enum.py", line 47, in <module>
    var3dnc = rootgrp.createVariable('variables3d', 'f8', ('time',features3d,msllevels,'lat','lon'))

I'm not understanding how to handle the enums. Maybe I'm pushing the feature beyond its limits

Marston commented 1 year ago

Looks like your enum types are defined inside the wem group, so you would have to do

nc['wem'].enumtypes

I'll test this later.

jswhit commented 1 year ago

You are trying to use the Enum types as dimensions instead of data types. Instead, try something like

var3dnc = rootgrp.createVariable('variables3d', features2d,  ('time',lat','lon'))
Marston commented 1 year ago

Oh, I see. So this means I cannot have 2 enums in one variable:
var3dnc = rootgrp.createVariable('variables3d', features2d, msllevels_t, ('time', 'lev', 'lat', 'lon') )

jswhit commented 1 year ago

No, an Enum is a datatype and each variable has only one of those

Marston commented 1 year ago

Thank you.

I have a working solution now and understand better the usage of ENUM.