Closed TotalVerb closed 7 years ago
I think the first suggestion could work well. The problem with the second is that in arabic, for example, final characters make it so where a letter is in a string is important. This could cause the answer to be incorrect if someone sums length of characters. Note: this is not a purely academic concern, bad code like this was responsible for a bug that allowed a text to crash iphones.
Thanks, I was unaware of that. We should fix our implementation of strwidth
then.
A quick bit of research suggests that doing it generally, properly, and in a performant manner is impossible. We have several options: not change algorithms, produce slightly wrong results for some languages. Change algorithms: Persian, Hebrew, Arabic, and others will be ~1000x slower. Throw an error if certain characters are in the input.
If someone wants a performant but incorrect algorithm, they can use sum(charwidth, _)
. If we are going to provide strwidth
, we may as well make it correct.
strwidth(::Char)
seems to make sense, since it could act as if you converted the Char to string, except possibly with a more efficient algorithm.
+1 for merging but not recommending sum(charwidth, s)
. Correctness is what matters, in particular in the present case where displaying the result is likely to be much slower than computing its width.
It's not like charwidth
is unambiguously "correct" either. utf8proc reports a reasonable value, but ultimately this depends on the font and terminal settings; see e.g. the discussion at JuliaLang/utf8proc#83.
Can you give a specific example of a Unicode string for which sum(charwidth, s)
gives an unambiguously wrong result?
@oscardssmith, I'm skeptical that incorrect glyph width (i.e. computation of on-screen character size) caused the iPhone crash. I can't find a lot of technical information on the iPhone crash you seem to be referring to, but what little I can find seems to indicate that it stemmed from incorrect computation of the in-memory width (i.e. the width in bytes) of a character, which has nothing to do with the on-screen width.
My point would be that deprecating charwidth
to strwidth(::Char)
or textwidth(::Char)
seems much safer than deprecating strwidth
to sum(charwidth, s)
. The latter assumes much more (even if that assumption is generally valid). textwidth
is also extensible to other objects that might represent text, regardless of how many characters they contain or whether they are iterable.
@stevengj that may be right, I was basing it off a video by Tom Scott here https://www.youtube.com/watch?v=hJLMSllzoLA.
Since these functions are very similar, I wonder if it is worth merging them into a single function, maybe
textwidth
.Alternatively, perhaps
strwidth(_)
can be speltsum(charwidth, _)
.