fragglet / lhasa

Free Software LHA implementation
http://fragglet.github.io/lhasa/
ISC License
80 stars 15 forks source link

Should add flags for filename encodings #43

Open fragglet opened 1 year ago

fragglet commented 1 year ago

The code currently does no translation for filename encodings and there are a variety of different ways that filenames can be encoded. In particular Shift-JIS and EUC support are important since lha format is/was very popular in in Japan. These unfortunately will need to be manually specified since as far as I know there is no way to detect the encodings. We should internally translate everything to UTF-8.

There are some extended ASCII formats that can be reasonably autodetected based on the OS field: for example CP437 is probably a sensible default for DOS archives (or the system codepage when running on Windows) , and Mac Extended ASCII for macOS archives. If the encoding cannot be determined then non-ASCII characters should become the Unicode replacement character.

With this in place we can relax the "safe print" code currently in place, although it's still important to never print a terminal escape character or anything in the C0/C1 control character ranges (and probably the specials range too)

gryf commented 10 months ago

Also, lha has been popular on Amiga OS. Default encoding seems to be Latin1, although there are different mappings for countries, which doesn't easily fall into Latin1. I guess, auto detection for corner cases could be difficult if not possible. Perhaps an external mapfile as an command line option could help in such situation, so that lhasa doesn't need to do make assumptions.

polluks commented 8 months ago

Indeed, Latin1 looks strange

...
[generic]                  909    2192  41.5% -lh5- 651e Nov 24  2018 AmiArcadia/Source/generic/espaÐl.ct
[generic]                  935    2225  42.0% -lh5- 3231 Nov 24  2018 AmiArcadia/Source/generic/franíÂis.ct
...

http://aminet.net/package/misc/emu/AmiArcadiaMOS