chubin / wttr.in

:partly_sunny: The right way to check the weather
https://wttr.in
Apache License 2.0
24.05k stars 1.08k forks source link

Devanagari, and other complex scripts formating/rendering issues #676

Open chubin opened 2 years ago

chubin commented 2 years ago

There are several problems related to complex scripts rendering:

PNG rendering of complex scripts can be done using raqm (https://github.com/HOST-Oman/libraqm).

img = Image.new('RGBA', (100, 70), color = (73, 109, 137))

fnt = ImageFont.truetype('sahadeva.ttf', 15, layout_engine=ImageFont.LAYOUT_RAQM)
d = ImageDraw.Draw(img)

d.text((10,10), "यात्रा ऋच्छति", font=fnt, fill=(255, 255, 0))
img.save('pil_text_font.png')
sanketgarade commented 2 years ago

Hi!

Fix formatting issues in browser

By this, do you mean fixing the unaligned lines and text which so that it is aligned with the rest of the layout? Like shown in the below image from mr.wttr.in ?

Can you please provide some direction/steps on how to do this? I will try to do it for the Marathi language.

image

chubin commented 2 years ago

@sanketgarade This is good question, and thank you very much for helping me with it. I don't know Devanagari at all (maybe it is a good opportunity to learn it, I even started once, but quickly came to the conclusion that it would take pretty much time).

So the first and the most important question is: what terminal should we use as a reference? I think I've shared in twitter alacritty rendering screenshot, and you (I believe it was you) said that itt was wrong. So should we stick to konsole?

sanketgarade commented 2 years ago

Yes initially let's stick to konsole because I have seen that Devanagari renders properly on it if the correct font is chosen. That is not the case with most other terminals.

chubin commented 2 years ago

Do you know a good library (Python, Go, or C, or actually any language), or algorithm, how one can get the visible length of some Devanagari text?

sanketgarade commented 2 years ago

Do you know a good library (Python, Go, or C, or actually any language), or algorithm, how one can get the visible length of some Devanagari text?

Hey sorry even I do not know. I will see if I can get any help from someone.

By "visible length" do you mean the length of the displayed text in pixels (or any other unit of length)?

chubin commented 2 years ago

@sanketgarade Sorry for the disappearance, now I am back, and would like to return to Devanagari rendering.

By "visible length" do you mean the length of the displayed text in pixels (or any other unit of length)?

Yes. I don't know if it applies to Devanagari script at all, but in some form, I think, yes: if you are printing some text on the terminal, it will take several characters.

With Devanagari it is a little bit more complicated, because some characters have zero length, and are just modifiers, for the previous characters, and some characters (maybe; I don't know it, but I hope you can shed more light on it) can occupy a non integer number of "terminal positions".

In any case, with visible length I mean "number of terminal positions occupied with text in order to display it".

Do you aware of any functions in any languages that could do it?

sanketgarade commented 2 years ago

Hey @chubin

Thanks for the information. Actually even I haven't been able to work on this myself due to lack of time. But I was able to connect with someone who works on Indic projects. I'll share the information that I received from them (though I haven't personally checked it in detail yet). -

Automated Rendering Testing using harfbuzz.

Using hb-view (reference), and piping the output to file, the size of the rendering can be seen in the output. See image below.

image

The language/script in above image is Malayalam, not Devanagari. But it should work similarly I think.

I'm not sure if this is what you are looking for. Let me know what you think.

chubin commented 2 years ago

It is a very interesting project, that you very much for referencing it, though it is not exactly what I am looking for, but maybe it could be helpful.

What exactly I would like to know, it is how to find the "length" of such string, in the sense of number of occupied characters. For example, the first word from your example takes 4 characters, and the second one 7 characters.

Only in terminal, I suppose. In browser it would take more or less places, I believe, but in terminal exactly this width. Maybe it depends on the terminal, but maybe it is the same in all terminals.

Maybe we should just make some "external" configurable functions, that can just return the width of any input string, and make it configurable depending on the terminal? Anyway, for the beginning, it would be great to find at least one of such functions.

sanketgarade commented 2 years ago

hey sorry, I was away from this thread.

Anyway, about this

Only in terminal, I suppose. In browser it would take more or less places, I believe, but in terminal exactly this width. Maybe it depends on the terminal, but maybe it is the same in all terminals.

I think the length of characters is the same in terminal and browser.

I did a small experiment to check this.

In browser, I opened mr.wttr.in and copied a word "काहीसे" from it. When copying it, the length is shown as "6 character"

image

In terminal I typed the same word and ran a word count on it, which also gave length as 6 characters. image

Conclusion is that both results match.

sanketgarade commented 2 years ago

What exactly I would like to know, it is how to find the "length" of such string, in the sense of number of occupied characters.

Does this suffice?

  1. In linux shell (screenshot in previous reply) , it can be done using the word count wc command Using it with the -m option will output just the number of characters.

  2. in python, using the len() function on a string. Tested here on Devanagari script with result matching the screenshot in previous reply. Pasting below for quick reference.

    
    clear = "clear"
    kahise = "काहीसे"

print("len of clear is ",len(clear)) print("len of kahise is ",len(kahise))

------ Output ------

len of clear is 5 len of kahise is 6

chubin commented 2 years ago

Thank you for your help, but it is not exactly what I mean. I am looking for some reliable way to know, how many characters on the terminal a Devanagari word occupies (not how much "characters" it has, because it is possible that a character takes one or two places, or maybe zero too, because it can be just an accent.

For example:

पावसाच्या हलक्या सरी
123456789012345

Here we see that the word occupies 15 characters, but len() returns 20:

>>> len("पावसाच्या हलक्या सरी")
20

and it is understandable, because len() knows nothing about Devanagari rendering, it knows only how many Unicode characters the string has.

For each script, there are libraries, that could help solve this problem. So the question is, how to do it for Devanagari

sanketgarade commented 2 years ago

Ok now I understood the problem statement. Basically you want to find out the number of English/Latin character spaces that are needed for a certain Devanagari string. Unfortunately I don't know of any such library which can do this.

But I think there can be an indirect way of doing this. (Though I am not sure if this method is suitable for this application.) :

  1. Say using the example which I had given in one of the comments above, using hb-view and file, we can find out the length of pixels needed for the Devanagari string, say pixel-len-dev.
  2. Now using the same we can also find out the pixel length of any English character ("any" because in monospace all English characters take up the same about of pixels), say pixel-len-en.
  3. And by dividing pixel-len-dev by pixel-len-en we would get the number of English character spaces that the Devanagari string takes up.