axodotdev / axolotlsay

💬 a CLI for learning to distribute CLIs in rust
Apache License 2.0
24 stars 8 forks source link

Axolotl bubbles too much #1

Closed non-descriptive closed 1 year ago

non-descriptive commented 1 year ago

For some reason bubble top and bottom parts longer than needed, when axolotl says in non-ascii characters.

         +----------------------+
         | Привет, мир |
         +----------------------+
        /
≽(◕ ᴗ ◕)≼
         +-----------------------+
         | やめてください |
         +-----------------------+
        /
≽(◕ ᴗ ◕)≼
         +-----------------+
         | لله أكبر |
         +-----------------+
        /
≽(◕ ᴗ ◕)≼
ashleygwilliams commented 1 year ago

oh! this is very interesting. thank you so much for filing!

the logic for the "bubbles" is very naive at the moment (https://github.com/axodotdev/axolotlsay/blob/main/src/main.rs#L14). the byte length of non-ascii symbols is more than 1 and as a result that means we are generating too many!

i can take a look at calculating this in a bit but if you are interested in tackling it i would welcome a contribution- i think this stack overflow article has a pretty good explanation of the issue and some hints at tools to use to solve it https://stackoverflow.com/questions/46290655/does-rusts-string-have-a-method-that-returns-the-number-of-characters-rather-th

ashleygwilliams commented 1 year ago

at least partially solved by #2 - i'll leave it up to @non-descriptive if they want to close this and file new more specific issues, or leave this one and explain some of the further issues/improvements :)

non-descriptive commented 1 year ago

Terminals can do many things, but when it comes to character encoding things getting complicated.

  1. Some characters is just one character and others can be composed of several characters: "é" vs "é" vs e̞̫ͧͫ̕. Most terminals can render only one combined character, and will add extra placeholders for others combined charcters. The latter e in Windows Terminal will look like this, for example. image So I think no axolotl can speak Z̙A̪̲͖̼̬͉͇L͙͎͈̜G͕̗̳̤͎̦Ǫ͎̼̫ invocations. Not on terminal at least. So as possible improvement one can run a normalization over characters of the string, to make them singular and discard the rest. But then comes the second issue
  2. Character width sometimes is bigger per one character than regular alphabet. Easy reproduce when you use Japaneese text from above for an unpatched version. CJK symbols have two representations - full-width and half width, where full-width take space of 2 regular characters. The unicode-width solves this kind of problem, but partially. I think it assumes that sequences of characters usually don't change width much and fails to measure other kind of input - smileys like ¯\_( ͡° ͜ʖ ͡°)_/¯ or other table flippers. It basically combines width problem with combining characters problem. And no amphibian can measure this nasty input width for any kind of terminal out there and they are many. Not sure if anyone can fix it, but you can dare.
  3. Also, I didn't tested much on changing writing directions from left-to-right to right-to-left and vice versa somewhere in the middle of opposed direction text and placing random bidi charcters in it. And whether this kind of text is some kinda valid symbol sequence or it's just undocumented feature like zalgoed text above.
  4. As ultimate task one can try run fuzzing test, but that's totally will be enterprise level "hello world" with l18n support.