Closed mhidas closed 3 years ago
Maybe unnecessary!?
If, you are putting numbers in attributes, you don't care much about their precision anyway...
You do care actually, one example is the use of CF packed data.
Yeah, that's essentially the use case I'm thinking of here, though I think _FillValue
, valid_min
, valid_max
, and valid_range
should always be the same type as the variable, not just for packed data. (I haven't tested it, but I would think this would already be ensured by the netCDF4 package for _FillValue
).
So, perhaps rather than making the template schema more complicated, the behaviour of DatasetTemplate
could be to automatically cast to the variable's type either
_FillValue
, valid_min
, valid_max
, and valid_range
, where specified.This of course wouldn't work for global attributes, but perpahs @ocehugo's comment applies to them? I can't think of any particular case where we'd want to specify a global attribute's type.
My preference is that any numeric attribute that is related to a particular variable should have the same type. Exceptions are for scale_factor
/ add_offset
variable attributes for packed variable where their type dictate the type/precision for the unpacked data.
For global attributes, things like geospatial_lat_min/max
should have the same type as the latitude variable for example.
This of course wouldn't work for global attributes, but perpahs @ocehugo's comment applies to them? I can't think of any particular case where we'd want to specify a global attribute's type.
example with wmo codes in global attributes. You want them as integer. But they could be falsely written as floats in a NetCDF
My preference is that any numeric attribute that is related to a particular variable should have the same type.
There is one tricky possibility: what if you want an integer attribute for a floating-point variable? (e.g. some kind of overall status flag)
For global attributes, things like geospatial_lat_min/max should have the same type as the latitude variable for example.
I'm not sure that's necessary, but would automatically be the case if we do #17 .
example with wmo codes in global attributes. You want them as integer. But they could be falsely written as floats in a NetCDF
This can already be done via the JSON. If a number is specified with no decimal point, it's read as an integer.
Actually, following on from my last comment above, the rule for variable attributes could be to cast all floating point attributes to the variable's type.
By the way, remember that this is only an issue for values specified in a JSON template. Attribute types can be easily controlled in Python code.
My simpleton argument was too shallow. my point is this feature is unnecessary.
First, we should avoid automatic internal casting/conversion, because the json python package would do the right thing (it convert/recast automatically).
I can see a point in providing a string in the template (something like "np.float32(1234.0005)"), but this would create code boilerplate, eval calls, and will change the precision unnecessarily. As said, python json package already cast to the correct precision.
Maybe this issue is due to a confusion regarding type names:
py.float is actually np.double because the py.float is actually a c double. Hence, json can read float32,flaot64 and ints in any precision up to 64. Everything is actually a float64 if not integer. I can hardly see anyone using float128 and remember how many digits they have to type in the template to match the exact precision. We should dump json format anyway in this case...
You do care actually, one example is the use of CF packed data.
The packed attributes are nonsense for the template. "scale_factor" and "add_offset" are usually setup automatically for compression. You got to know the data to setup both. At template time, data is not known and setup this is pretty much noise, given any attempt to compress after that will change this values.
My preference is that any numeric attribute that is related to a particular variable should have the same type. Exceptions are for scale_factor / add_offset variable attributes for packed variable where their type dictate the type/precision for the unpacked data.
We can't do it on the package [there are ints, float32, float64, boolean and bytes from CDL for example], but you can enforce style in the IMOS template for example...
Maybe the only thing json parser is not good is to provide bytes (CDL allows it in the attributes i think).
Currently global & variable attributes are just specified as key:value pairs with the value either a string, a list, or a number, to be converted into netCDF atttributes by the netCDF4 library's
setncattr
andsetncatts
methods. This is fine in most cases, except when we explicitly want to set the type of an attribute to e.g. double (e.g. to match the data type of the variable).Within Python code it's easy to specify the attribute type by setting its value to the appropriate
numpy
object (e.g.np.float32(1.234)
). However, when numeric values are specified in a JSON template, they are automatically converted into int, long, or float (https://docs.python.org/2/library/json.html#json-to-py-table). We need to allow attribute values in the template to be specified as another (JSON) object, with properties "type" and "data".