golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.35k stars 17.38k forks source link

proposal: go/doc: ignore leading and trailing whitespace in the directive arguments #68092

Open jimmyfrasche opened 3 weeks ago

jimmyfrasche commented 3 weeks ago

Proposal Details

51082 and #37974 standardized Go directive comments as line comments whose content matches ^(line |extern |export |[a-z0-9]+:[a-z0-9])(.*)$ where the first group is the directive and the second group is the argument to the directive.

This makes leading and trailing whitespace part of directive comments.

Trailing whitespace is almost always removed and should never be semantic so it is already de facto not part of the argument.

One space is required for the old (line/export/extern) directives and additional leading whitespace is presumably ignored [have not tested this, will update if incorrect].

Someone writing a general //x:y directive in their code will likely add whitespace between the directive and argument even if not syntactically required so anyone parsing directives will likely ignore at least one leading space regardless.

Someone reading the directive //x:yZ would assume that x:yZ is the directive name rather than it is the directive x:y with argument Z so having whitespace here would clarify.

I propose changing the argument syntax from .*$ to [ \t]*([^ \t]*)[ \t]*$ where the leading and trailing whitespace is ignored.

This would allow two things

  1. The iterator proposed in #68021 could trim the arguments automatically so no one using the official API for parsing directives will need to worry about this
  2. (not proposed here but enabled by this change) gofmt could insert a clarifying space when necessary, normalize leading whitespace, or even line up arguments to multiple directives
gabyhelp commented 3 weeks ago

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

ianlancetaylor commented 3 weeks ago

The current gc compiler requires at least one space character after the directive name. However, for directives that take arguments, the compiler consistently treats a sequence of whitespace characters (per unicode.IsSpace) as a single whitespace character. So trailing whitespace is currently ignored and leading whitespace is ignored other than requiring a single space character.

For what it's worth the current version of gccgo is stricter: it ignores sequences of space and tab characters, but does not permit arbitrary Unicode whitespace characters.

So if we made this change, one compatibility concern would be people who wrote a directive followed by a tab character. Such directives would have their expected meaning in a new version of Go, and would be ignored by an old version of Go. This could potentially lead to confusion if someone wrote code for a newer version and then compiled it with an older version. I don't think that is worth worrying about but I want to call it out.

The other compatibility concern if we adopt the proposal precisely as written is that currently Unicode whitespace characters like U+00A0 are treated as space, but this proposal suggests just space and tab. So I do think that the regexp we use, if we change this, should replace |\t with [[:space:]]|\pZ.

jimmyfrasche commented 3 weeks ago

I'm certainly fine with unicode.IsSpace. I don't know how often unicode spaces end up in comments but there's no reason for invisible differences to have distinct effects.