alexdej / puzpy

Python library for reading and writing across lite crossword puzzle .puz files.
MIT License
112 stars 32 forks source link

Support Unicode, including emoji #14

Closed sfiera closed 3 years ago

sfiera commented 6 years ago

UTF-8 is the standard encoding in version 2.0 of the PUZ binary format. Increase the default version for newly-created Puzzle objects to 2.0, and use UTF-8 in that case.

svisser commented 6 years ago

@sfiera hi there, thanks for adding support for UTF-8 for .puz 2.0 files.

I think the changes are fine but I'm a bit concerned with passing the entire Puzzle instance to the PuzzleBuffer rather than only the encoding. The PuzzleBuffer needs to know the encoding but it shouldn't really be possible to access arbitrary puzzle information within the PuzzleBuffer.

Would you agree if I make that change before merging this in?

sfiera commented 6 years ago

That sounds reasonable. I was following the example of Markup, which doesn’t seem to need anything but puzzle.extensions.

Having made a few puzzles with this, I’m now less sure that defaulting to 2.0 is a good idea. It works with Across Lite, but not Shortyz, for example (parses as Latin-1 anyway). I’ve gone back to 1.4 for my own, but it’s your call what the default should be.

svisser commented 6 years ago

That's true, looks like I'll have to refactor Markup and Rebus as well.

It may make sense to file an issue with Shortyz for .puz 2.0 support.

I won't update the default to 2.0 just yet for now.

alexdej commented 3 years ago

Thanks for contributing this. I agree with Simeon that we'd want a slightly different factoring for this so I went ahead and incorporating that into a new PR based on yours which I will merge shortly. Please let me know if this you have any problems with the adapted change.