Closed matvore closed 6 years ago
I didn't know char casting did that, that was not intended behavior.
So I redesigned the approach for parsing ASCII property list. It now works on a char array instead of a byte array. An encoding can be specified explicitly, otherwise the parser attempts to detect it (UTF-8, UTF-16, UTF-32 or ASCII). I created a feature branch for this reworked parser: https://github.com/3breadt/dd-plist/tree/asciipropertylist-configurable-encoding
What do you think?
That's great! That commit would definitely fit my requirements.
Currently, ASCIIPropertyListParser takes bytes[] and then pads the bytes with an extra 00 byte to get UTF-16. If the byte is >= 0x80, then it pads it with 0xff. This means that if the bytes are in the 7-bit ASCII range, everything is fine. But if not, 0x80 for example becomes 0xff80, (half-width TA katakana) which I don't believe corresponds to any real encoding system.
The options are to:
I think UTF-8 is a better default. The default system encoding is good for backwards compatibility, but this feature (non-7-bit ASCII) has never worked at all before, so that's not really necessary. This can also be made configurable if the need presents itself.