Closed tamo closed 9 months ago
For @Lexikos to decide.
I think that it is sufficient to state in e.g. remarks that the output for such characters may be different than expected, including an example.
In my opinion, replacing "width ... in characters" with "number of characters" or even "length" is not an improvement in terms of clarity, since an emoji or similar can also be interpreted as a single character, so the user would still encounter the problem.
Yes, remarks can be sufficient. I won't insist on rewording all the "width"s.
FYI, width and length are clearly different words for those who write command line programs. For example, POSIX or C99 has wcswidth and wcslen as separate functions. wcswidth is really difficult to implement but wcslen is relatively easy. Personally, I'd like to replace "width"s if they actually mean "the number of characters" because they are simply wrong.
In case you are interested in real examples
msg := ""
for(str in [
"πΊπΈ",
"πͺ",
"π¨βπ©βπ§βπ¦"
]) {
msg .= str
slen := StrLen(str)
msg .= Format(" (strlen={:d})`n", slen)
loop(slen) {
i := A_Index
msg .= Format(" [{}:{:" i "s}]", i, str)
}
msg .= "`n"
loop(slen) {
i := A_Index
msg .= Format(" [{}.{}:{:" i "." i "s}]", i, i, str)
}
msg .= "`n`n"
}
MyGui := Gui()
MyGui.Add("Text",, msg)
MyGui.Show()
Most of the format specifiers are implemented via printf, and our documentation intentionally uses the same terminology as the Microsoft documentation (though intentionally not copying the copious amount of detail). This is the width specification; what else could it be but the width of the formatted value? Replacing "width" with "length" only confuses matters by removing the obvious connection between that sentence and the corresponding part of the format specifier which is named "Width".
When there is only one dimension being measured, how could a meaningful dictinction be made between "width" and "length"? Even with three dimensions, if you're measuring a box, does it matter which side is the "width" and which is the "length"?
It is really hard to calculate characters' width ... Microsoft doesn't say their printf calculates the width:
Our documentation doesn't say that either. It is "the width, in characters", not "the width of the characters".
If Format() really accepted width, I would expect that Format("{:2}", "π") returns "π" (only a smile) instead of " π" (a space and a smile).
It does return only a smile, so it seems to be meeting your expectation, contrary to what you seem to be saying.
Either way, I don't understand your reasoning. I'd guess that you are ascribing some meaning to "width" and/or "characters" that I don't agree with.
The meaning of "characters" isn't strictly defined, although the semantic note under Unicode vs ANSI indicates that supplementary characters such as this one are usually treated as two "characters". Most of the formatting is done by the C runtime, so whether a supplementary character is treated as 1 or 2 characters in this context is up to the C runtime.
examples
That just seems to demonstrate that "Width" controls the minimum (and as such, doesn't truncate) and ".Precision" controls the maximum (and as such, does truncate). Is this not consistent with the documented behaviour?
I'd guess that "π¨βπ©βπ§βπ¦" is a series of combining characters which are rendered as a single glyph, or multiple glyphs overlaid. It is not a single character by any conventional definition that I know, although I'm not very familiar with terminology for rendering text.
Thanks for the comment, Lexikos! I won't insist on my change if you intentionally chose these words.
It is really hard to calculate characters' width. (See how long https://github.com/microsoft/terminal/issues/900 is.)
If Format() really accepted width, I would expect that Format("{:2}", "π") returns "π" (only a smile) instead of " π" (a space and a smile).
Microsoft doesn't say their printf calculates the width: https://github.com/MicrosoftDocs/cpp-docs/blob/main/docs/c-runtime-library/format-specification-syntax-printf-and-wprintf-functions.md#width
We too could use technically correct words like "the minimum number of characters"