cslarsen / jp2a

Converts jpg images to ASCII
GNU General Public License v2.0
797 stars 71 forks source link

Special characters don't work as a custom character palette? #13

Open supercom32 opened 4 years ago

supercom32 commented 4 years ago

Hi Everyone,

Under Ubuntu, I noted that if you try to use special characters as a your palette, all jp2a seems to return are those funny "<?>" characters that signify that the terminal can't render it. But the strange thing is, if I paste my UTF-8/Unicode characters like "░▒▓" into the terminal, it renders just fine. It's only after jp2a reads them that they get mangled up.

Does anyone have any ideas how to get this working? Maybe I need to specify special characters in a slightly different way?

Cheers,

EDIT:

As a workaround, I noted that you could simply set your pallete to something jp2a does understand (ie. "#$%") and then do a find/replace in the output to add back your special character palette. It kind of a hack, but at least the characters don't get mangled!

Talinx commented 4 years ago

jp2a can only use ASCII and thus thinks that n Bytes must equal n characters, which is not generally true for UTF-8. Using verbose output shows that:

Output palette (9 chars): '░▒▓'

Internally, jp2a uses an array for the ASCII palette and just indexes into it, which is very efficient. So the terminal gets only 1 byte of a 2, 3 or 4 byte long UTF-8 character, which is not a valid character (and so displays <?>).

It is possible to add UTF-8 support: Screenshot - UTF-8 jp2a This quick-and-dirty implementation is a little bit slower (by 2.7% after some very rough initial measurements).

I don't know if it is worth it to implement UTF-8 support: Are there many use cases? What is your opinion?

Annoyingly my terminal does not display special characters in monospace. And of course, this program is called jp2a, not jp2u. ASCII is in the name.

(jp2a is now maintained here: Talinx/jp2a)

supercom32 commented 4 years ago

Ah, thanks for the kind reply. I did not realise that the 'a' stood for ASCII. :-D.

Personally I think UTF-8 support is great because almost all modern terminals use it. It allows for more unique characters and designs and It would be a shame to cut out such a large audience (since I'm not exactly sure how many people are willing to switch encoding schemes just for a small handful of programs). This is all in my humble opinion anyway. I completely understand if you don't feel the same way. I figured I might as well ask just in case. (^_^);

Talinx commented 4 years ago

Added UTF-8 support: https://github.com/Talinx/jp2a/commit/f7fc5ac7a3e8c7120de14949caca766271bc5685 😊🎉 UTF-8 is optional and the default, so If someone only wants ASCII, this can be done at compile time. This implementation is fast so there should be no reason there to not support UTF-8. (Unicode has some special cases where 2 UTF-8 characters are combined for 1 character, for example flags. This implementation can't handle that.)