Open jimmyfrasche opened 1 month ago
cc @griesemer per owners
Perhaps it should be Directives(namespaces ...string)
. That would make it cleaner in the rare case you want all directives and, more importantly, simpler in the case where you want your namespace but also need to, for example, account for directives from another tool that you're replacing or just need to interoperate with.
cc @findleyr @adonovan for visibility.
CC @lfolger, since we were just discussing the lack of an API like this one.
At a high level I do think we should expose this in an API. The proposal looks reasonable to me. I suspect that the variadic form is overkill, preferring the original proposal, but I could be convinced otherwise.
I'm not convinced the namespaces argument is necessary: the iterator must always visit every node, so there's no efficiency gain by having the iterator perform the filtering, and the non-monotonic behavior for len(namespaces)=0 seems undesirable. Make the iterator yield them all, and let the caller filter.
Perhaps we should define a new type, type Directive { Namespace, Name, Arguments string }
, and just return a plain Seq[Directive]
. That alleviates the caller from thinking about parsing the first element, or from getting confused as to how the three values are split into the two parts of a Seq2.
:+1: on a new type. Though maybe it should just be type Directive { Namespace, Name string }
and iter.Seq2[Directive, string]
for the arguments? That would make it simpler to use the directive as a map key.
The variadic proposal may be overkill. I'm not entirely confident. The main argument is performance, as I imagine the iterator would parse the comments for each invocation. If you do need to check multiple namespaces you'd either need to parse the comments n times or iterate over all the directives and implement your own filtering logic and I based it off #67795 so I figured the non-monotonic behavior would be acceptable. OTOH a multi-namespace filter may need to do some allocations or preprocessing and those would likely be the same for the entire program so having it as a separate filter would
I do think the most common case is going to be a tool looking for its directives so having some simple namespace filtering built in is going to simplify the majority of callers who would otherwise all have to implement their own filtering. For example, the code I was writing that led me to file this only cares about two directives in its custom namespace so it would be a lot simpler to just write g.Directives("mystuff")
than
for dir := range g.Directives() {
if dir.Namespace != "mystuff" {
continue
}
// ...
}
Built in filtering can also handle the special cases of extern, export, and line, which have no namespace but you'd want to show up in a query for the "go" namespace. Of course, if there's a type it could export a method to handle this.
That would make it simpler to use the directive as a map key.
But it's not a map, it's a sequence of pairs whose keys may be duplicates.
I mean that if I wanted to collect results as map[Directive][]T
where type T struct{ Node; args string}
and Arguments
is part of Directive
I'd need to shuffle things around but if they're separated it's more straightforward.
I mean that if I wanted to collect results as
map[Directive][]T
wheretype T struct{ Node; args string}
andArguments
is part ofDirective
I'd need to shuffle things around but if they're separated it's more straightforward.
That's true, but most clients will discard most directives (at least ones of the wrong namespace), so a map[string][]T
will do, and three-field Directive struct gives us room for future improvements (e.g. a method to parse arguments in some emerging future standard way).
For export
, extern
, and line
, I think the Namespace
field of Directive
should be set to "go"
. Fudging that removes an edge case. There could be a String
method that renders them without the go:
The updated proposal so far is:
// Directives yields each [directive comment] in g as a [Directive].
//
// [directive comment]: https://go.dev/doc/comment#syntax
func (g *CommentGroup) Directives() iter.Seq[Directive] {
// ...
}
type Directive struct {
Namespace, Name, Arguments string
}
For line
/export
/extern
directives, there are two choices:
Namespace
as ""
Namespace
as "go"
1 is syntactically correct. There is no namespace for such directives as they are written.
2 is semantically correct. Those directives belong to Go and, as such, implicitly belong to the "go"
namespace.
These directives need to be special cased in parsing so manually setting the namespace to "go"
is trivial.
For rendering to a string, these need to be special cased either way (to avoid rendering ":line"
or "go:line"
instead of "line"
) and as such Directive
should be a fmt.Stringer
.
So for std these need to be special cased and documented either way.
For user code, 2 removes a special case as Namespace != ""
and everything that belongs to Go is labeled "go"
.
However, it's unlikely that anyone would care about anything other than a small fixed set of namespaces, often a singleton, and it's likely that this set contains neither "go"
nor ""
, so few users would even be in a situation where they could run into this.
These are both right. 2 seems more right to me. But ultimately it doesn't seem likely to matter much in practice and should be documented whatever the decision.
It looks like Arguments
is pretty free form. Are the following correct or did I misunderstand something:
Directive → Arguments (notes):
"//0:0 a \n"
→ " a "
(two spaces on either side)
"//a:aA\n"
→ "A"
"//line X\n"
→ "X"
(first space is part of name)
If possible, it would be nice to further standardize that leading and trailing white space is ignored (and have gofmt normalize it to a single space). I can open another issue, if that's a possibility.
edit, the respective hypothetical normalizations would be:
"//0:0 a \n"
→ "//0:0 a\n"
"//a:aA\n"
→ "//a:a A\n"
edit 2 corrected //a:A
example to //a:aA
I filed #68061 to simplify the namespace question I posted earlier today by litigating it out of existence
I'm not sure I understand
"//a:A\n" → "A"
I would expect it to lead to
Directive{Namespace: "a", Name: "A", Arguments: ""}
Why Is "A" and argument and not the name? Is this to make the normalization easier?
That was a bug in the comment. Updated it to:
"//a:aA\n"
→ "//a:a A\n"
The A isn't lowercase so it's not part of the directive name so it's part of the arguments. Allowing and normalizing leading whitespace would make that clearer.
I went ahead and filed #68092 for the argument whitespace, if it's possible to change that.
Summary of how these proposals fit together.
This proposal is for:
func (g *CommentGroup) Directives() iter.Seq[Directive] {
// ...
}
type Directive struct {
Namespace, Name, Arguments string
}
If #68061 is accepted, then it's invariant that Namespace != ""
and all the Go directives get Namespace == "go"
If #68092 is accepted, then it's invariant that Arguments == strings.TrimSpace(Arguments)
With both accepted, any Directive
can be turned back into a string with fmt.Sprintf("%s:%s %s", d.Namespace, d.Name, d.Arguments)
; otherwise, Directive
will require a String
method to handle the various special cases (though it could certainly still have a String
method).
Should it be Argument
singular since there's only one of them?
Proposal Details
*ast.CommentGroup
has a helpfulText
method to extract the text of comments that are not directives, but there is no simple means to get the directives of the comment other than by parsing the raw comments.Now that directives are standardized and allow third party directives, while the format is simple, it would be nice to make them easier to access.
The most common case would be for a third party tool to want to see only the directives in its namespace so this use case should be made simplest.
I imagine something like: