Closed Apollo3zehn closed 4 months ago
Yes I thought about this as well. I couldn't see anything definitive in the spec, this is an example file were clearly the names are UTF8 encoded (but I don't know how its been created). Pragmatically I decided to change to UTF8 as its compatible with ASCII I don't see any downside, actually made me consider just switching to UTF8 everywhere even where the encoding is specifically defined in the spec and the file. Would be interested what you think on this?
I am not sure why I originally thought that this could be an ASCII string :shrug:
I also can't find anything in the spec that says it's ASCII only, and I checked other parts of my code that rely on the "get local heap object name" function, and none of them assume ASCII. For example, in the spec, the External File List Slot
has a field name called Name Offset in Local Heap
and there is no reference to ASCII or UTF-8 in the field description either. Another structure that stores a name on the local heap is the Symbolic Link Scratch-pad
and again, nothing is specified.
Additionally, the HDFView software can display that specific group name without problems. So to summarize, I think we are safe to assume that decoding the local heap content as UTF-8 is ok here.
And none of my tests fail with the change.
... actually made me consider just switching to UTF8 everywhere even where the encoding is specifically defined in the spec and the file. Would be interested what you think on this?
I also thought about this but did not dare yet to do this. This would need some investigation first to detect possible problems. One might be, that writing such files could lead to incompatibilities with the C-library. However, just reading UTF-8 everywhere should be fine I would say.
https://github.com/jamesmudd/jhdf/issues/539#issuecomment-1923308452 https://github.com/jamesmudd/jhdf/pull/544/commits/14e939d697b2a90c8bdf392089859e29e3579654