DistributedProofreaders / guiguts

Perl/Tk text editor designed for editing and formatting public domain material for inclusion at Project Gutenberg
GNU General Public License v2.0
9 stars 10 forks source link

Don't assume good/badwords files are utf8-encoded #1267

Closed windymilla closed 1 year ago

windymilla commented 1 year ago

Apparently, they can be "Latin-1" - previous work assumed utf8.

windymilla commented 1 year ago

Sharon just explained to me that the goodwords were recreated, but the zip that the PPer downloads were not. This zip dates from 2018. I checked the goodwords file itself directly downloaded from the project page (last modified May 2020, presumably when it was converted to utf8) and it is utf8. So, it's just the old zips that could contain non-utf8 goodwords files. GG used to cope with that before my recent assumptive "improvements".

windymilla commented 1 year ago

It's thanks to @srjfoo of course, very much on the ball! Thanks too, Casey, for checking up the explanation.

srjfoo commented 1 year ago

Looking at the project history, the project was checked out at the time. It was just recently returned to the pool, whereupon Charlie checked it out. I don't remember for sure, but I don't think we regenerated files for projects that were checked out for PP, did we?