etbcor / nasin-nanpa

nasin sitelen tan anpa nanpa
MIT License
85 stars 3 forks source link

nasin nanpa omits some digits (1234) #10

Closed Sjlver closed 7 months ago

Sjlver commented 7 months ago

nasin nanpa normally keeps letters and digits that it does not know, such as capital letters.

However, the digits 1 to 4 seem to be missing. For example:

open CAPITAL nanpa 1234567890 nanpa 0987654321 nanpa 3.141592 pini

... is rendered as:

image

Note that these digits are not replaced with the glyphs for wan, tu, tu wan, or tu tu. They are simply missing.

My preferred solution would be to include these digits in nasin nanpa. I know that toki pona doesn't like numbers, but for texts that do occasionally contain them, it would be better to render all their digits.

Sjlver commented 7 months ago

Note: as a workaround, I'm using replacement Unicode characters such as U+1D7E3 MATHEMATICAL SANS-SERIF DIGIT ONE. That doesn't look so good, though.

etbcor commented 7 months ago

Hi! Thanks for leaving a comment. Believe it or not, this is actually intended behavior for version 3.1, as the digits 1-4 have ligatures that turn them in to variation selectors 1-4 (U+FE00 - U+FE03), so that alternate glyphs are easily accessible.

This will also be the case in the main version of nasin nanpa 4, but starting with v4, I'll also be releasing a version without all those ligatures (which only displays SP via UCSUR codepoints), which sounds like it would be helpful for you!

I'm still working on v4, and I'll hopefully be ready to release it soon. For the time being though, it is possible to get nasin nanpa 3.1 to render 1-4 if you have control over font-features in your application. You would just need to disable ligatures (liga) for the selection of text containing the digits.

I hope this clears some things up -- and that the UCSUR version of nasin nanpa will be helpful for you once it's out!

Sjlver commented 7 months ago

Thanks for the explanation! It's OK to close this issue if that's indeed the desired behavior. Regarding my use of nasin nanpa, I've just published a blogpost a day ago: https://blog.purpureus.net/posts/monsuta-li-moli-e-jan-500m-li-pini/ I will check if I can control the use of ligatures in the CSS.

With that said, I admit that I don't fully understand the logic. Why can the digits 1-4 not represent themselves if they stand alone? I understand that ko1 and ko2 get combined into different variations of ko. I would similarly understand if U+F191C (ko) followed by 2 get combined into a ko variant. However, if 2 is not preceded by something suitable (let's say if it is preceded by a space or by another digit), why can't it just be the digit 2?

I don't want to be demanding here, nor consume much of your time. Feel free to not answer the question and close this issue :) nasin nanpa worked well enough for that blog post*


* There was one other issue, but I opened #11 separately for that.

etbcor commented 7 months ago

Actually, that way of doing it is totally doable, and I see the advantage it would offer. It should be a relatively easy change to make, so I'll put in in v4!

etbcor commented 7 months ago

Ok, I tweaked the script a bit, and this should now be set up correctly for v4. Closing #10!