Unidata / thredds

THREDDS Data Server v4.6
https://www.unidata.ucar.edu/software/tds/v4.6/index.html
265 stars 179 forks source link

Enumerations become strings when setting the ncML attribute enhance="all" #1042

Open risquez opened 6 years ago

risquez commented 6 years ago

Please, I would like to confirm that the behavior that I observe is intended and not a bug.

I am interested in how enumerations are managed when they are defined in a ncML and afterwards a netCDF file is created from there.

Shortly:

When I define an enumeration, using enhance="all", an associated variable becomes a string (not an enumeration anymore), and I can write any value into it, not only those values defined in the enumeration. Is this intended or a bug?

This behavior looks different than the netCDF-C library approach. The netCDF-C library always manages the integer value and giving the responsibility of doing the conversion to strings to the user/programmer. From the netCDF-C documentation:

Enums are based on any integer types. The underlying integer type is what is stored in the file.

Long explanation:

Following the NetcdfDataset Tutorial (enhance):

When using ConvertEnums enhance mode, Variables of type enum are promoted to String types and data is automatically converted using the EnumTypedef objectss, which are maps of the stored integer values to String values.

Ok, I understand, but I was expecting that albeit the variable becomes a netCDF string, the set of values for the variable would be limited to the enumeration mapping. And this is not the case. Let me give an example:

I created the following ncML:

<?xml version="1.0" encoding="UTF-8"?>
<ncml:netcdf xmlns:ncml="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="all">

  <ncml:enumTypedef name="ssd_type" type="enum1">
      <ncml:enum key="0">500m</ncml:enum>
      <ncml:enum key="1">1km</ncml:enum>
      <ncml:enum key="2">2km</ncml:enum>
  </ncml:enumTypedef>

  <ncml:variable name="ssd" shape="" type="enum1" typedef="ssd_type">
  </ncml:variable>

</ncml:netcdf>

Note that I use the enhance="all" ncML attribute, and that ssd is a variable of the ssd_type enumeration data type.

I create a netCDF file using the netCDF-Java library (note: a preview of v5):

java -Xmx1g -classpath netcdfAll-5.0.0-20180211.132322-244.jar ucar.nc2.write.Nccopy --input test.ncml --output test.nc --format netcdf4

The output netCDF file that I get is this (using ncdump):

netcdf test {
types:
    byte enum ssd_type {\500m = 0, \1km = 1, \2km = 2} ;
variables:
    string ssd ;
// global attributes:
    :_CoordSysBuilder = "ucar.nc2.dataset.conv.DefaultConvention" ;
data:
    ssd = _ ;
}

Note that ssd is a variable of string data type. Ok, as described in the tutorial.

But I am surprised when now I could populate the ssd variable with any data. For example, using Python I could write "wrong" in the variable (!), although that is not one of the allowed values in the enumeration (500m, 1km, 2km).

from netCDF4 import Dataset
ncid = Dataset( 'test.nc', 'a' )
ncid.variables['ssd'][0]='wrong'
ncid.close()

And now the output from ncdump is:

netcdf test{
types:
    byte enum ssd_type {\500m = 0, \1km = 1, \2km = 2} ;
variables:
    string ssd ;
// global attributes:
    :_CoordSysBuilder = "ucar.nc2.dataset.conv.DefaultConvention" ;
data:
    ssd = "wrong" ;
}

An enumerated variable becomes a string and therefore it accepts any value. Please, is this the intended behavior or a bug?

DennisHeimbigner commented 6 years ago

Yes, this is intended behavior. Restricting the value set would in effect require us to implement a type that is equivalent to enum.