Majored / rs-async-zip

An asynchronous ZIP archive reading/writing crate.
MIT License
131 stars 44 forks source link

Support to read Info-ZIP Unicode Path Extra Field #103

Closed ArcticLampyrid closed 1 year ago

ArcticLampyrid commented 1 year ago

See also: How to use Unicode filenames in ZIP format APPNOTE.TXT - .ZIP File Format Specification

ArcticLampyrid commented 1 year ago

Since most tools on Windows store Unicode path in "Zip Unicode path extra field", any users should handle this extension properly otherwise they will face failure when handling non-ASCII char. It is especially common for non-English language to use Unicode chars. Filenames are so basic an attribute that it deserves handing it at library level.

According to the spec:

If both the File Name and Comment fields are UTF-8, the new General Purpose Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate that both the header File Name and Comment fields are UTF-8 and, in this case, the Unicode Path and Unicode Comment extra fields are not needed and SHOULD NOT be created.

Note that, for backward compatibility, bit 11 SHOULD only be used if the native character set of the paths and comments being zipped up are already in UTF-8.

For Windows users, the native character set is NEVER UTF-8. So any tool that meet the recommendation requires us to handle this extension properly.