Open trexfeathers opened 1 year ago
Hey @gavinevans, we're currently a bit low on resources, is this something you'd be interested on working on?
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.
If this issue is still important to you, then please comment on this issue and the stale label will be removed.
Otherwise this issue will be automatically closed in 28 days time.
This issue hasn't yet been resolved.
This new activity has prompted a very useful discussion in @SciTools/peloton:
NetCDF only supports ASCII (i.e. every character must be 1 byte). Iris could do something with non-ASCII characters, but it would be Iris specific - no other library would know how to interpret it.
We're quite uncomfortable making an explicit decision here, since the Iris devs are not exposed to all the possible user cases. Since there is no official convention here, we would prefer for individual users/teams to define their own encode/decode rules, since they alone know the specifics (e.g. how many bytes are needed). This would probably take the form of a bytes array (rather than a character array), with user-written functions to write and read correctly. @gavinevans @brhooper how does this sound?
If anyone is aware of an 'official' convention that Iris should follow, please speak up 😊
I am not sure I understand what you mean with "NetCDF only supports ASCII (i.e. every character must be 1 byte)". Is the problem specific to string/char auxiliary coordinate values?
I am not sure I understand what you mean with "NetCDF only supports ASCII (i.e. every character must be 1 byte)". Is the problem specific to string/char auxiliary coordinate values?
We believe you can use Unicode in NetCDF names and in string attributes, but NOT in any data arrays.
🐛 Bug Report
From @gavinevans
Attempting to save a
Cube
including a stringAuxCoord
with non-ASCII characters (i.e. Unicode characters) raises the following exception:How To Reproduce
Steps to reproduce the behaviour:
Expected behaviour
Should save with no exception (as happens when using the commented line above).
Environment
v3.2.1.post0
andv3.4.0
Additional context
Related:
4101
4412
I think the fix will hinge on allowing for the extra bytes needed to store encoded Unicode characters. We currently divide the length in 4, which I think means we are always assuming a Unicode string can be converted to an ASCII one:
https://github.com/SciTools/iris/blob/fc302c9c08c292cb2075d2dd249bcbdfacf08da8/lib/iris/fileformats/netcdf/saver.py#L1881-L1883
Changing this could have loading consequences too?
Expand for traceback with Iris v3.4
``` Traceback (most recent call last): File ".../iris/lib/2023-01-03_gavin.py", line 17, in