Access Enum in NC4 file #1228

Marston commented 1 year ago

I'm trying to access an enum using the python module netcdf4. This enum is written in a netcdf4 file. I can see the enum as a type but I cannot access it. Here's a ncdump:

group: wem {
    ubyte enum features_3d {Temperature = 0,
        Pseudo_adiabatic_potential_temperature = 1,
        Dew_point_temperature = 2, Specific_humidity = 3,
        Relative_humidity = 4, Wind_direction = 5, Wind_speed = 6,
        Density = 7, Potential_temperature = 8, Humidity_mixing_ratio = 9,
        Geopotential_height = 10, Absolute_vorticity = 11,
        Relative_vorticity = 12, Relative_divergence = 13,
        Ice_water_mixing_ratio = 14, Density_Altitude = 15,
        Height_D-values = 16, U_wind = 17, V_wind = 18,
        Cloud_mixing_ratio = 19, Rain_water_mixing_ratio = 20} ;
    ubyte enum features_2d {Temperature_2m = 0, Temperature_tropopause = 1,
        Latent_heat_flux = 2, Sensible_heat_flux = 3,
        Surface_Skin_Temperature = 4, Precipitable_water = 5,
        Absolute_Humidity = 6, Maximum_absolute_humidity = 7,
        Horizontal_moisture_convergence = 8, Potential_evaporation_rate = 9,
        Vertical_speed_shear = 10, Pressure_at_the_tropopause = 11,
        Pressure_reduced_to_MSL = 12, Geopotential_height_fzglvl = 13,
        Geopotential_height_tro = 14, Model_terrain_height = 15} ;
    ubyte enum msllevels_t {\0 = 0, \1000 = 1, \3000 = 2, \6000 = 3,
        \9000 = 4, \12000 = 5, \15000 = 6, \18000 = 7, \21000 = 8,
        \24000 = 9, \27000 = 10, \30000 = 11, \33000 = 12, \36000 = 13,
        \39000 = 14, \41000 = 15, \45000 = 16, \48000 = 17, \51000 = 18,
        \54000 = 19, \57000 = 20, \60000 = 21, \65000 = 22, \70000 = 23,
        \75000 = 24, \80000 = 25, \85000 = 26, \90000 = 27, \95000 = 28,
        \100000 = 29, \110000 = 30} ;
        time = UNLIMITED ; // (1 currently)
        lat = 721 ;
        lon = 1440 ;
        msllevels = 31 ;
        features3d = 21 ;
        features2d = 16 ;
        double time(time) ;
                time:units = "seconds since 1990-01-01 00:00:00" ;
                time:long_name = "time" ;
                time:axis = "T" ;
                time:calendar = "Standard" ;
        float lat(lat) ;
                lat:units = "degrees_north" ;
                lat:long_name = "latitude" ;
                lat:axis = "Y" ;
        float lon(lon) ;
                lon:units = "degrees_east" ;
                lon:long_name = "longitude" ;
                lon:axis = "X" ;
        int msllevels(msllevels) ;
        int features3d(features3d) ;
        int features2d(features2d) ;
        double variables3d(time, features3d, msllevels, lat, lon) ;
        double variables2d(time, features2d, lat, lon) ;
  } // group wem

But I cannot see it in the python variable:

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    groups: wem

How do I access the enums? Was it written in the file incorrectly?

jswhit commented 1 year ago

There is an enum datatype defined in the file, but no variables created with that type. You can access the enum type with


where nc is the Dataset object.

Marston commented 1 year ago

Thanks for the access code. But how can I create the file using the enum? I thought that wasusing the enum but clearly, I'm missing the mark. Could you please show me an example of how to create variables using the enums?

Marston commented 1 year ago

I'm unable to get that enum command to work: nc.enumtypes['msllevels_t']. It says that nc has no method called enumtypes.


['CompoundType', 'Dataset', 'Dimension', 'EnumType', 'Group', 'MFDataset', 'MFTime', 'NC_DISKLESS', 'NC_PERSIST', 'VLType', 'Variable', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__has_cdf5_format__', '__has_nc_create_mem__', '__has_nc_inq_format_extended__', '__has_nc_inq_path__', '__has_nc_open_mem__', '__has_parallel4_support__', '__has_pnetcdf_support__', '__has_rename_grp__', '__hdf5libversion__', '__loader__', '__name__', '__netcdf4libversion__', '__package__', '__path__', '__spec__', '__version__', '_netCDF4', 'chartostring', 'date2index', 'date2num', 'default_encoding', 
'default_fillvals', 'get_chunk_cache', 'getlibversion', 'glob', 'is_native_big', 'is_native_little', 'ma', 'num2date', 'numpy', 'pathlib', 'posixpath', 'set_chunk_cache', 'stringtoarr', 'stringtochar', 'subprocess', 'sys', 'unicode_error', 'utils', 'warnings', 'weakref']

The EnumType that is there is the constructor so this is not what I'm looking for.

jswhit commented 1 year ago

There is an example in the docs at http://unidata.github.io/netcdf4-python/#enum-data-type

Marston commented 1 year ago

I see how to create it. Thanks. But why doesn't your example work? Is it because the ncfile was not made correctly?

jswhit commented 1 year ago

here's a self contained example

import numpy as np
# Enum type example.
f = Dataset('clouds.nc','w')
# python dict describing the allowed values and their names.
enum_dict = {'Altocumulus': 7, 'Missing': 255, 'Stratus': 2, 'Clear': 0,
'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5, 'Cumulonimbus': 1,
'Stratocumulus': 3}
# create the Enum type called 'cloud_t'.
cloud_type = f.createEnumType(np.uint8,'cloud_t',enum_dict)
time = f.createDimension('time',None)
# create a 1d variable of type 'cloud_type' called 'primary_clouds'.
# The fill_value is set to the 'Missing' named value.
cloud_var = f.createVariable('primary_cloud',cloud_type,'time',\
# write some data to the variable.
cloud_var[:] = [enum_dict['Clear'],enum_dict['Stratus'],enum_dict['Cumulus'],\
# close file, reopen it.
f = Dataset('clouds.nc')
cloud_var = f.variables['primary_cloud']
jswhit commented 1 year ago

Looks like your enum types are defined inside the wem group, so you would have to do

Marston commented 1 year ago

I created this simple test of what I did and trying to use the enums in variable creation, but I keep getting this error:

import netCDF4 as nc
import numpy as np

ncfile = nc.Dataset('test.nc', mode='w', format='NETCDF4') 
rootgrp = ncfile.createGroup('wem')

dimNames = []
dimsnc = {}
varnc = dict()
dimsAttrs = {'lat': {'attr': {'units': 'degrees_north', 'long_name': 'latitude', 'axis': 'Y'}, 'dtype': 'f4'},
                     'lon': {'attr': {'units': 'degrees_east', 'long_name': 'longitude', 'axis': 'X'}, 'dtype': 'f4'},
                     'plevels': {'attr': {'units': 'Pa', 'long_name': 'isobaric pressure levels', 'axis': 'Z'}, 'dtype': 'f4'},
                     'msllevels': {'attr': {'units': 'ft', 'long_name': 'height in msl', 'axis': 'Z'}, 'dtype': 'f4'},                     
                     'time': {'attr': {'units': 'seconds since 1990-01-01 00:00:00', 'long_name': 'time', 
                                       'axis': 'T', 'calendar': 'Standard'}, 'dtype': 'f8'},
                     'features2d': {'attr': {'units':'-', 'long_name': 'Names of GALWEM 2D variables mapped to indices'}},
                     'features3d': {'attr': {'units':'-', 'long_name': 'Names of GALWEM 3D variables mapped to indices'}}
print(f'Building the feature dictionaries...')
feat_dict3d = {'Temperature': 0, 'Pseudo_adiabatic_potential_temperature': 1}
feat_dict2d = { 'Temperature_2m': 0, 'Temperature_tropopause': 1 }
print(f'Creating the dimensions for nc4 file...')                                    
nlat = 5
nlon = 5
nlev = 5
lev_dict = {'0': 0, '1000': 1, '3000': 2, '6000': 3, '9000': 4}

features3d = rootgrp.createEnumType(np.uint8,'features3d_t',feat_dict3d)
features2d = rootgrp.createEnumType(np.uint8,'features2d_t',feat_dict2d)
msllevels = rootgrp.createEnumType(np.uint8,'msllevelset_t', lev_dict)

rootgrp.createDimension('time', None)
rootgrp.createDimension('lat', nlat)
rootgrp.createDimension('lon', nlon)

timedimnc = rootgrp.createVariable('time', 'f8', ('time',))
for attr, name in dimsAttrs['time']['attr'].items():
    setattr(timedimnc, attr, name)
latdimnc = rootgrp.createVariable('lat', 'f4', ('lat',))
for attr, name in dimsAttrs['lat']['attr'].items():
    setattr(latdimnc, attr, name)
londimnc = rootgrp.createVariable('lon', 'f4', ('lon',))
for attr, name in dimsAttrs['lon']['attr'].items():
    setattr(londimnc, attr, name)
del attr, name

var3dnc = rootgrp.createVariable('variables3d', 'f8', ('time',features3d,msllevels,'lat','lon'))
var2dnc = rootgrp.createVariable('variables2d', 'f8', ('time',features2d,'lat','lon'))

Exception has occurred: AttributeError
'netCDF4._netCDF4.EnumType' object has no attribute '_dimid'
  File "test_enum.py", line 47, in <module>
    var3dnc = rootgrp.createVariable('variables3d', 'f8', ('time',features3d,msllevels,'lat','lon'))

I'm not understanding how to handle the enums. Maybe I'm pushing the feature beyond its limits

Marston commented 1 year ago

Looks like your enum types are defined inside the wem group, so you would have to do


I'll test this later.

jswhit commented 1 year ago

You are trying to use the Enum types as dimensions instead of data types. Instead, try something like

var3dnc = rootgrp.createVariable('variables3d', features2d,  ('time',lat','lon'))
Marston commented 1 year ago

Oh, I see. So this means I cannot have 2 enums in one variable:
var3dnc = rootgrp.createVariable('variables3d', features2d, msllevels_t, ('time', 'lev', 'lat', 'lon') )

jswhit commented 1 year ago

No, an Enum is a datatype and each variable has only one of those

Marston commented 1 year ago

Thank you.

I have a working solution now and understand better the usage of ENUM.