bazelbuild / stardoc

Stardoc: Starlark Documentation Generator
Apache License 2.0
103 stars 40 forks source link

Unable to distinguish between args and kwargs in function signatures; varargs ordering lost #225

Closed rickeylev closed 3 weeks ago

rickeylev commented 1 month ago

There's an edge case involving *args and **kwargs that can't be distinguished with the current stardoc protos:

def foo(*x): pass
def bar(**x): pass

In both cases, the proto output will be:

name: "x"
mandatory: false

This means the original signature can't be accurately recovered.

Special casing names like args and kwargs would probably work most of the time, but isn't a guarantee, as users could name these arguments anything.

It might be possible for consumers to distinguish between *args and **kwargs if they keep track of the order of mandatory-ness, but I'm not entirely sure. Basically, I think if you see two non-optional args without a default after a mandatory arg, you know the first is *args and the second is **kwargs.

IMHO, it would be better if the protos just indicated directly if an arg is a * or ** by having another field on the proto: optional ArgKind kind, with values:

(The python docs about Parameter.kind might also be informative: https://docs.python.org/3/library/inspect.html#inspect.Parameter.kind)

rickeylev commented 1 month ago

It might be possible for consumers to distinguish between *args and **kwargs

So this ends up being possible for many, but not all cases. What you can do is:

For posterity, here's an rough implementation of the above:

var_args_param = None
var_kwargs_param = None
saw_mandatory_after_optional = False
optionals_started = False

for is_last, p in parameters:
  optionals_started = optionals_started or not p.mandatory
  if p.mandatory and optionals_started:
    saw_mandatory_after_optional = True
  if not p.default and not p.mandatory:
    if var_args_param is None:
      var_args_param = struct(is_last=is_last, p=p)
    else:
      var_kwargs_param = p

if var_args_param and not var_kwargs_param:
  if not var_args_param.is_last:
    pass
  elif saw_mandatory_after_optional:
    var_kwargs_param = var_args_param.p
    var_args_param = None
  elif var_args_param.p.name in ("kwargs", ...):
    var_kwargs_param = var_args_param.p
    var_args_param = None

The name checking hueristic is...a hueristic. It's not uncommon to use an alternative name that has more semantic meaning.

tetromino commented 1 month ago

The real fix for both this issue and #226 require changes to the proto emitted by starlark_doc_extract. I can see 3 possible solutions:

  1. Change the parameter names and order in proto output: emit "*args" and "**kwargs" instead of "args" and "kwargs", put "*args" in the right position, and insert a fake "*" parameter for positional/keyword-only boundary marker. This does not require any change in the proto definition, is very easy to use for consumers of the proto to understand, and I have the Stardoc side implemented experimentally in https://github.com/tetromino/stardoc/tree/star. The downside of this approach is migration: the new semantics of the proto will break rendering by older versions of stardoc, and therefore will require an incompatible flag in bazel.
  2. Change the proto format and add an enum to represent each parameter's semantic role: unspecified (legacy default), positional, keyword-only, residual args, residual kwargs. It's redundant, but has the benefit of easier migration (no need for an incompatible flag in bazel).
  3. Change the proto format and add the count of positional, keyword-only, and residual parameters to the function info (i.e. expose them from the underlying StarlarkFunction java object to the proto). Although this approach looks tempting, I am leaning against it because e.g. there is no way for the renderer to distinguish a proto emitted by Bazel 8 for foo(args) from a proto emitted by Bazel 7 for foo(**args).
rickeylev commented 1 month ago

Thanks for responding, @tetromino

tldr: As a user, Option 2 is the most appealing

returning *args/**kwargs names will break rendering

How so? I haven't checked, but from what I remember:

Some special casing of how links are generated will probably be needed (i.e stripping the * character so the anchor ids are still foo.args instead of foo.*args (or some such).

No way for the render to distinguish a proto from bazel 7 vs 8

You could stick a bazel_version field in the proto. This might be useful in general -- when generating cross references, it makes it possible to link to the versioned bazel docs. That said, it's fairly easy to pass along the bazel version to a renderer some other way (the same as any rule would want to know the bazel version). TBH, I'm not sure I'd make use of it in my case. When linking to external docs, I think linking to the latest version is a better user experience.

My personal preference disfavors an overall version field to interpret the meaning of data. I'm more of a fan of feature detection (e.g. option 2)

expose the underlying count of positional, keyword only, and residual parameters

While this sounds like enough to recover the original signature, it also sounds like an awkward API. It's much cleaner if each Parameter carries info about itself. But it works, so I can't complain too much -- it's your code and api :).

I would argue that fixing the order *args isn't a breaking change -- it's fixing a bug.

Just another idea: a transitional arg on the starlark_doc_extract rule? IDK if that really changes things, just throwing it out there.

As a user, if the choice is between "wait for bazel 8 because its considered a breaking change" and "wait for next 7.x because there's now additional info in the proto", I'd would take the latter. Protos are naturally pretty good at this sort of expansion, so checking for an extra field is easy. i.e. my preference is option 2.