Closed Zac-HD closed 1 year ago
Thanks for bringing this up! The data you gathered is definitely useful for a project like autotyping.
Explicitly checking, would you accept a PR that adds a new --guess-common-names
argument to autotyping
for this?
Yes
Should I only take the ones with a checkmark in the should guess?
column - or have a separate "extended" flag that also includes the common names that "shouldn't" be guessed? (Or force the user to add those themselves with --annotate-optional
and --annotate-named-param
)
Should --guess-common-names
also turn on guessing when the default is None, i.e. from
def foo(bar=None):
...
infer
def foo(bar: Optional[str] = None):
...
and/or should that be a separate flag, something like --guess-common-names-optionals
For non-bool defaults I don't see any reason to change anything since that's already inferred.
Thanks for your work! I'd be curious to hear @Zac-HD's opinions too, but here are my reactions:
--guess-common-names
is equivalent to --guess-common-names=1
, and --guess-common-names=2
(or --guess-common-names --guess-common-names
) adds more speculative types.I would not guess more speculative types; I think they're wrong too often to be useful. And instead of looking at the table, just grab the code that I wrote:https://github.com/HypothesisWorks/hypothesis/pull/3313/files#diff-9cc5e151a0e8c7f741d59c6406f5c0ac31f6284b29072f1dc264255a2dab5d91 (noting that it distinguishes some strategies that are the same type).
Cool - implemented it and pushed to #48, though leaving it as draft as there's a couple ones I'm unsure about:
List
as a type. List[Any]
is ofc also an option which one could later weed out with e.g. mypy's --disallow-any-generics
.name in ("pred", "predicate")
typing.Callable
doesn't accept just setting the return value, so [if we're annotating generics] I'm just annotating this as a generic Callable
.name in ("amount", "threshold", "number", "num")
int|float
, guessing I should just leave them alone?"uuid" in name
st.uuids().map(str)
- is that just a str
?
typing.Callable
doesn't accept just setting the return value, so [if we're annotating generics] I'm just annotating this as a genericCallable
.
It does, Callable[..., ReturnType]
. But this has the same problems as with using List
.
Should generics be typed at all? They're almost always gonna need manual intervention anyway, so I'm unsure about the value of adding List as a type.
Let's just skip them then, the basics are remarkably helpful already and probably have a lower error rate.
well... I got feeling and wrote inferences for handling stuff like int_list
and list_of_widths
. But everything that can't be fully inferred is skipped. And more complicated generics like dict
or callable
are skipped.
Skipping uuid
too - it seems to mostly be typed as str
- but uuid.UUID
or a hexadecimal int feels common enough that it should be skipped.
👋 I just found this project, and
--annotate-named-param
reminds me of a weekend research project I did for the Hypothesis ghostwriter, analyzing a few hundred million arguments in a corpus of Python code: https://github.com/HypothesisWorks/hypothesis/issues/3311. The context is different enough between our projects that you might want to make your own decisions based on the table in that issue rather than just lifting code out of my PR (hereby offered under MIT).Clauses like
if name.startswith("is_") or name in BOOL_NAMES:
aren't fully expressible with--annotate-named-param
, so this might be worth a new--guess-common-names
argument as well as a table of suggestions, e.g. "pattern
is usually eitherstr
orre.Pattern[str]
, or sometimesbytes
".