JelleZijlstra / autotyping

Automatically add simple type annotations to your code
215 stars 18 forks source link

Name-based type inference? #46

Closed Zac-HD closed 1 year ago

Zac-HD commented 1 year ago

👋 I just found this project, and --annotate-named-param reminds me of a weekend research project I did for the Hypothesis ghostwriter, analyzing a few hundred million arguments in a corpus of Python code: https://github.com/HypothesisWorks/hypothesis/issues/3311. The context is different enough between our projects that you might want to make your own decisions based on the table in that issue rather than just lifting code out of my PR (hereby offered under MIT).

Clauses like if name.startswith("is_") or name in BOOL_NAMES: aren't fully expressible with --annotate-named-param, so this might be worth a new --guess-common-names argument as well as a table of suggestions, e.g. "pattern is usually either str or re.Pattern[str], or sometimes bytes".

JelleZijlstra commented 1 year ago

Thanks for bringing this up! The data you gathered is definitely useful for a project like autotyping.

Zac-HD commented 1 year ago

Explicitly checking, would you accept a PR that adds a new --guess-common-names argument to autotyping for this?

JelleZijlstra commented 1 year ago

Yes

jakkdl commented 1 year ago

Should I only take the ones with a checkmark in the should guess? column - or have a separate "extended" flag that also includes the common names that "shouldn't" be guessed? (Or force the user to add those themselves with --annotate-optional and --annotate-named-param)

Should --guess-common-names also turn on guessing when the default is None, i.e. from

def foo(bar=None):
  ...

infer

def foo(bar: Optional[str] = None):
 ...

and/or should that be a separate flag, something like --guess-common-names-optionals

For non-bool defaults I don't see any reason to change anything since that's already inferred.

JelleZijlstra commented 1 year ago

Thanks for your work! I'd be curious to hear @Zac-HD's opinions too, but here are my reactions:

Zac-HD commented 1 year ago

I would not guess more speculative types; I think they're wrong too often to be useful. And instead of looking at the table, just grab the code that I wrote:https://github.com/HypothesisWorks/hypothesis/pull/3313/files#diff-9cc5e151a0e8c7f741d59c6406f5c0ac31f6284b29072f1dc264255a2dab5d91 (noting that it distinguishes some strategies that are the same type).

jakkdl commented 1 year ago

Cool - implemented it and pushed to #48, though leaving it as draft as there's a couple ones I'm unsure about:

JelleZijlstra commented 1 year ago
  • typing.Callable doesn't accept just setting the return value, so [if we're annotating generics] I'm just annotating this as a generic Callable.

It does, Callable[..., ReturnType]. But this has the same problems as with using List.

Zac-HD commented 1 year ago

Should generics be typed at all? They're almost always gonna need manual intervention anyway, so I'm unsure about the value of adding List as a type.

Let's just skip them then, the basics are remarkably helpful already and probably have a lower error rate.

jakkdl commented 1 year ago

well... I got feeling and wrote inferences for handling stuff like int_list and list_of_widths. But everything that can't be fully inferred is skipped. And more complicated generics like dict or callable are skipped.

Skipping uuid too - it seems to mostly be typed as str - but uuid.UUID or a hexadecimal int feels common enough that it should be skipped.