Open visr opened 3 years ago
Thanks @visr , for helping the user on SO!
The _Unsigned
attribute does not seem to be part of the CF convention.
Many links to the NetCDF best practise are broken, but I found it here:
- To be completely safe with unknown readers, widen the data type, or use floating point.
- You can use the corresponding signed types to store unsigned data only if all client programs know how to interpret this correctly.
- A new proposed convention is to create a variable attribute _Unsigned = "true" to indicate that integer data should be treated as unsigned.
I think that point 2 is also interesting. It could be read as that such files (using the corresponding signed types to store unsigned data) should not be used for public distribution were you do not control the client program.
Does somebody know if the work on the "new proposed convention" is still on-going? It also seems that this is specific to files written by old version of NetCDF-Java. Does somebody know since which version of NetCDF-Java use native unsigned types?
Indeed this is not a CF convention, only an old "proposed convention", which in practise still seems to be used (the example is new data), even though it shouldn't be. It seems like NetCDF-Java has had unsigned capabilities for a while, it's just users not taking advantage of this when updating their old data model.
The most commonly used clients seem to have implemented support for this. In a way it's quite unfortunate, since it leads to potentially misinterpreted data like in the SO post. I'm not sure how difficult it would be to add support for _Unsigned
, potentially we can use reinterpret
like in my SO answer, and avoid copying the data. If we decide to not support it, then perhaps we should throw an error when we encounter it.
I read this StackOverflow question: https://stackoverflow.com/q/68135528, and through it I found out that there are apparently netCDFs out there with variables of type
short
, but if they have an attribute_Unsigned
with value"true"
, then this data is supposed to be interpreted asunsigned short
(which netCDF-4 also supports). I read some background in https://github.com/Unidata/netcdf4-python/issues/656 and it seems this is a bit of a heritage from netCDF-3.Since readers in other languages seem to support this, I guess perhaps we should too?
EDIT: see also my SO answer for an example file with some code.