Unidata / netcdf

NetCDF Users Group (NUG)
MIT License
6 stars 10 forks source link

Clarify language around attribute (and other?) names "commencing with underscore" #25

Open ethanrd opened 3 years ago

ethanrd commented 3 years ago

The NUG "Attribute Conventions" appendix states that

"Attribute names commencing with underscore ('_') are reserved for use by the netCDF library."

This should be changed to "reserved for use by netCDF libraries" (or "netCDF implementations").

The phrase

"names commencing with underscore are reserved for system use"

is used in the following sections:

Note: Or, to be super clear, maybe "software that directly implements reading and writing of netCDF datasets". Except that doesn't deal with libraries that wrap the HDF library. Drop "directly" or switch from reads and writes to "(un)encodes". Yuk! Maybe not so explicit is better.

dopplershift commented 3 years ago

Maybe "reserved for the netCDF format itself and not intended for use by end-users"?

DennisHeimbigner commented 3 years ago

Possibly relevant is that the streaming formats (DAP2 and DAP4) insert underscored attributes having special meaning to those protocols. So it is also the case that data being streamed should not have underscore attributes to avoid name conflicts in this case.

lesserwhirls commented 3 years ago

As @DennisHeimbigner points out, we should keep in mind that while some of these underscore attributes are encoded in the file directly, some are added after the fact (occasionally encoded directly, but are often added in-memory by a specific implementation). An example of the latter from the netCDF-Java library would be the _Coordinates attribute, which IOSPs can add to help in the coordinate system layer of the netCDF-Java library (even to existing netCDF files); an example from the netCDF-C side would be the _IsNetcdf4 or _SuperblockVersion attributes. The attributes added by dap2 and dap4 are another example.

I think we need to make a clear distinction between the two cases, with a heavy emphasis on those encoded into the on-disk format. For example, we should at a minimum answer the questions "what underscore attributes must be encoded into a file in order for it to be considered a netCDF file?", and "what format should their values take?". Since netCDF-4 files are not versioned, we can only say "must" about things that are true about files created from netCDF-C v4.0.0 onward. We can then add recommendations (strong ones, even) about new attributes that have appeared onto the scene (e.g. _NCProperties), including when they first started showing up and the motivation for their addition, but we cannot say must at this point.

Once that's settled, we could add a more generic "Any other underscore attributes, whether encoded directly in the file or added in-memory, are reserved for use by individual netCDF libraries." The _Coordinates attribute falls under this category. I would go even further and say that we should state that in general, encoding new underscore attributes into a file is strongly discouraged in favor of using existing metadata conventions."

ethanrd commented 3 years ago

Namespaces for attribute names? (Not that this helps the current issue.)

The LinkedData for netCDF folks are using bald__ (Binary Array Linked Data) as a pseudo namespace prefix. CF had an attribute namespace discussion (Trac 27) years ago that was leaning to use a colon (':') but then the discussion stalled.

Would it be appropriate to have a namespace for attribute names "standard" defined in the NUG? Or is CF and other standards a better place for that? (Having it the NUG would allow us to reserve "nc__" or whatever for future use.)