dylex / zip-stream

Haskell ZIP archive streaming processing using conduit
BSD 3-Clause "New" or "Revised" License
7 stars 8 forks source link

Handle filename encoding #3

Closed gkleen closed 6 years ago

gkleen commented 6 years ago

The documentation says about filenames "usually UTF-8 encoded".

ZIP actually has a flag to indicate that filenames are to be considered as UTF8-encoded. If the flag is not set filenames are, by specification, encoded as CP437 (I´m aware that this is not the case in practice but since there is no way to actually indicate any other encoding this would at least break less things than also assuming UTF8 in this case)

dylex commented 6 years ago

This seems correct, but given real-world usage and performance concerns, I'm tempted to handle it a different way: simply expose the flag to users (along with the ByteString encoded path), but not actually do the decoding, at least not automatically by default. Could add an optional layer that then did the decoding. How would you feel about that?

gkleen commented 6 years ago

As long as the flag is exposed (and documented) I´d be happy.

I'll have a shot at implementing that over the next few days provided you don't want to take point?

dylex commented 6 years ago

Go for it. I think you have most of the hard parts covered already so hopefully just some reorganization. And thanks for the #2 PR as well.