Open chubin opened 2 years ago
Hi!
Fix formatting issues in browser
By this, do you mean fixing the unaligned lines and text which so that it is aligned with the rest of the layout? Like shown in the below image from mr.wttr.in ?
Can you please provide some direction/steps on how to do this? I will try to do it for the Marathi language.
@sanketgarade This is good question, and thank you very much for helping me with it. I don't know Devanagari at all (maybe it is a good opportunity to learn it, I even started once, but quickly came to the conclusion that it would take pretty much time).
So the first and the most important question is: what terminal should we use as a reference? I think I've shared in twitter alacritty rendering screenshot, and you (I believe it was you) said that itt was wrong. So should we stick to konsole
?
Yes initially let's stick to konsole
because I have seen that Devanagari renders properly on it if the correct font is chosen. That is not the case with most other terminals.
Do you know a good library (Python, Go, or C, or actually any language), or algorithm, how one can get the visible length of some Devanagari text?
Do you know a good library (Python, Go, or C, or actually any language), or algorithm, how one can get the visible length of some Devanagari text?
Hey sorry even I do not know. I will see if I can get any help from someone.
By "visible length" do you mean the length of the displayed text in pixels (or any other unit of length)?
@sanketgarade Sorry for the disappearance, now I am back, and would like to return to Devanagari rendering.
By "visible length" do you mean the length of the displayed text in pixels (or any other unit of length)?
Yes. I don't know if it applies to Devanagari script at all, but in some form, I think, yes: if you are printing some text on the terminal, it will take several characters.
With Devanagari it is a little bit more complicated, because some characters have zero length, and are just modifiers, for the previous characters, and some characters (maybe; I don't know it, but I hope you can shed more light on it) can occupy a non integer number of "terminal positions".
In any case, with visible length I mean "number of terminal positions occupied with text in order to display it".
Do you aware of any functions in any languages that could do it?
Hey @chubin
Thanks for the information. Actually even I haven't been able to work on this myself due to lack of time. But I was able to connect with someone who works on Indic projects. I'll share the information that I received from them (though I haven't personally checked it in detail yet). -
Automated Rendering Testing using harfbuzz.
Using hb-view
(reference), and piping the output to file
, the size of the rendering can be seen in the output. See image below.
The language/script in above image is Malayalam, not Devanagari. But it should work similarly I think.
I'm not sure if this is what you are looking for. Let me know what you think.
It is a very interesting project, that you very much for referencing it, though it is not exactly what I am looking for, but maybe it could be helpful.
What exactly I would like to know, it is how to find the "length" of such string, in the sense of number of occupied characters. For example, the first word from your example takes 4 characters, and the second one 7 characters.
Only in terminal, I suppose. In browser it would take more or less places, I believe, but in terminal exactly this width. Maybe it depends on the terminal, but maybe it is the same in all terminals.
Maybe we should just make some "external" configurable functions, that can just return the width of any input string, and make it configurable depending on the terminal? Anyway, for the beginning, it would be great to find at least one of such functions.
hey sorry, I was away from this thread.
Anyway, about this
Only in terminal, I suppose. In browser it would take more or less places, I believe, but in terminal exactly this width. Maybe it depends on the terminal, but maybe it is the same in all terminals.
I think the length of characters is the same in terminal and browser.
I did a small experiment to check this.
In browser, I opened mr.wttr.in and copied a word "काहीसे" from it. When copying it, the length is shown as "6 character"
In terminal I typed the same word and ran a word count on it, which also gave length as 6 characters.
Conclusion is that both results match.
What exactly I would like to know, it is how to find the "length" of such string, in the sense of number of occupied characters.
Does this suffice?
In linux shell (screenshot in previous reply) , it can be done using the word count wc
command
Using it with the -m
option will output just the number of characters.
in python, using the len()
function on a string.
Tested here on Devanagari script with result matching the screenshot in previous reply.
Pasting below for quick reference.
clear = "clear"
kahise = "काहीसे"
print("len of clear is ",len(clear)) print("len of kahise is ",len(kahise))
------ Output ------
len of clear is 5 len of kahise is 6
Thank you for your help, but it is not exactly what I mean. I am looking for some reliable way to know, how many characters on the terminal a Devanagari word occupies (not how much "characters" it has, because it is possible that a character takes one or two places, or maybe zero too, because it can be just an accent.
For example:
पावसाच्या हलक्या सरी
123456789012345
Here we see that the word occupies 15 characters,
but len()
returns 20:
>>> len("पावसाच्या हलक्या सरी")
20
and it is understandable, because len()
knows nothing about Devanagari rendering,
it knows only how many Unicode characters the string has.
For each script, there are libraries, that could help solve this problem. So the question is, how to do it for Devanagari
Ok now I understood the problem statement. Basically you want to find out the number of English/Latin character spaces that are needed for a certain Devanagari string. Unfortunately I don't know of any such library which can do this.
But I think there can be an indirect way of doing this. (Though I am not sure if this method is suitable for this application.) :
hb-view
and file
, we can find out the length of pixels needed for the Devanagari string, say pixel-len-dev
.pixel-len-en
.pixel-len-dev
by pixel-len-en
we would get the number of English character spaces that the Devanagari string takes up.
There are several problems related to complex scripts rendering:
PNG rendering of complex scripts can be done using raqm (https://github.com/HOST-Oman/libraqm).