AutoHotkey / AutoHotkeyDocs

Documentation for AutoHotkey
https://autohotkey.com/
375 stars 746 forks source link

Format's "width" spec controls length, not width #678

Closed tamo closed 9 months ago

tamo commented 10 months ago

It is really hard to calculate characters' width. (See how long https://github.com/microsoft/terminal/issues/900 is.)

If Format() really accepted width, I would expect that Format("{:2}", "πŸ˜€") returns "πŸ˜€" (only a smile) instead of " πŸ˜€" (a space and a smile).

Microsoft doesn't say their printf calculates the width: https://github.com/MicrosoftDocs/cpp-docs/blob/main/docs/c-runtime-library/format-specification-syntax-printf-and-wprintf-functions.md#width

We too could use technically correct words like "the minimum number of characters"

Ragnar-F commented 10 months ago

For @Lexikos to decide.

I think that it is sufficient to state in e.g. remarks that the output for such characters may be different than expected, including an example.

In my opinion, replacing "width ... in characters" with "number of characters" or even "length" is not an improvement in terms of clarity, since an emoji or similar can also be interpreted as a single character, so the user would still encounter the problem.

tamo commented 10 months ago

Yes, remarks can be sufficient. I won't insist on rewording all the "width"s.

FYI, width and length are clearly different words for those who write command line programs. For example, POSIX or C99 has wcswidth and wcslen as separate functions. wcswidth is really difficult to implement but wcslen is relatively easy. Personally, I'd like to replace "width"s if they actually mean "the number of characters" because they are simply wrong.

tamo commented 10 months ago

In case you are interested in real examples

msg := ""
for(str in [
    "πŸ‡ΊπŸ‡Έ",
    "πŸ‘ͺ",
    "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦"
]) {
    msg .= str
    slen := StrLen(str)
    msg .= Format(" (strlen={:d})`n", slen)
    loop(slen) {
        i := A_Index
        msg .= Format(" [{}:{:" i "s}]", i, str)
    }
    msg .= "`n"
    loop(slen) {
        i := A_Index
        msg .= Format(" [{}.{}:{:" i "." i "s}]", i, i, str)
    }
    msg .= "`n`n"
}
MyGui := Gui()
MyGui.Add("Text",, msg)
MyGui.Show()

image

Lexikos commented 9 months ago

Most of the format specifiers are implemented via printf, and our documentation intentionally uses the same terminology as the Microsoft documentation (though intentionally not copying the copious amount of detail). This is the width specification; what else could it be but the width of the formatted value? Replacing "width" with "length" only confuses matters by removing the obvious connection between that sentence and the corresponding part of the format specifier which is named "Width".

When there is only one dimension being measured, how could a meaningful dictinction be made between "width" and "length"? Even with three dimensions, if you're measuring a box, does it matter which side is the "width" and which is the "length"?

It is really hard to calculate characters' width ... Microsoft doesn't say their printf calculates the width:

Our documentation doesn't say that either. It is "the width, in characters", not "the width of the characters".

If Format() really accepted width, I would expect that Format("{:2}", "πŸ˜€") returns "πŸ˜€" (only a smile) instead of " πŸ˜€" (a space and a smile).

It does return only a smile, so it seems to be meeting your expectation, contrary to what you seem to be saying.

Either way, I don't understand your reasoning. I'd guess that you are ascribing some meaning to "width" and/or "characters" that I don't agree with.

The meaning of "characters" isn't strictly defined, although the semantic note under Unicode vs ANSI indicates that supplementary characters such as this one are usually treated as two "characters". Most of the formatting is done by the C runtime, so whether a supplementary character is treated as 1 or 2 characters in this context is up to the C runtime.

examples

That just seems to demonstrate that "Width" controls the minimum (and as such, doesn't truncate) and ".Precision" controls the maximum (and as such, does truncate). Is this not consistent with the documented behaviour?

I'd guess that "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦" is a series of combining characters which are rendered as a single glyph, or multiple glyphs overlaid. It is not a single character by any conventional definition that I know, although I'm not very familiar with terminology for rendering text.

tamo commented 9 months ago

Thanks for the comment, Lexikos! I won't insist on my change if you intentionally chose these words.