JPEG schema US-ASCII encoding leads to odd translation of non-ASCII characters

DFDLSchemas / JPEG

Joint Photographic Experts Group Image File Format

Apache License 2.0

2 stars 4 forks source link

It looks like that is a COM field in the jpeg file, which is used for comments. For this field, the JPEG specification says "the interpretation is left to the application". So it seems there is no standard encoding for this field, and this likely applies to all other string fields in a jpeg file. To preserve the comment and other field data exactly, it probably makes sense to change the encoding to ISO-8859-1. This encoding has no illegal values (which both US-ASCII and UTF-8 have) and so comment data will never be replaced due to encoding errors--this should allow parsing and unparsing exaclty the same. It does mean daffodil won't detect garbage/malicious comment data, but that can be handled outside of daffodil if it's important to the use case.

Would you like to create a pull request switching to ISO-8859-1?

DFDLSchemas / JPEG

JPEG schema US-ASCII encoding leads to odd translation of non-ASCII characters #3