animetosho / ParPar

High performance PAR2 create client for NodeJS
193 stars 19 forks source link

utf-8 support #4

Closed misieck closed 7 years ago

misieck commented 7 years ago

Non-ascii characters (like öäå) in file names are saved garbled and then reported as not found when verifying.

animetosho commented 7 years ago

The PAR2 Specifications does not support UTF-8. Filenames need to be in ASCII (and this is what ParPar does), although I suspect other clients may be using whatever the local character set is, instead of interpreting names as ASCII (which is where your problem may lie).

PAR2 does support a "Unicode filename packet", where the name is encoded in UTF-16/UCS-2. ParPar should also be generating such a packet, by default, if the filename contains non-ASCII characters.
Are you seeing such a case where this isn't happening?

Non-ascii characters (like öäå)

Note that these characters fall within 8-bit ASCII.
There's actually an internal setting to force ParPar to always generate unicode packets, which I could expose. Perhaps it'd make more sense to get the auto-detect algorithm to always generate the unicode packet if characters fall outside the 7-bit ASCII range, which should deal with problematic clients...

animetosho commented 7 years ago

Hopefully this now works for you by default.

animetosho commented 6 years ago

I may have misunderstood the original intent of this issue, but anyway, to provide an update:
I've changed the behavior of the 'ASCII filenames' to encode them as UTF-8. Using 'ASCII' encoding is likely incorrect, despite what the specifications say, and it seems there's at least one client which has issues with this. UTF-8 may still end up being wrong (to which not a whole lot can be done) but should at least avoid compatibility issues.

It is still recommended for clients to use the unicode packet if available, for maximum compatibility.