[red-knot] Test setup utilities

sharkdp commented 1 month ago

This recently came up in a discussion. A lot of red-knot tests require some form of "setup" in the sense that they create variables of a particular type. This is not always straightforward. For example, to create a variable of type int, you need to make sure that it doesn't end up as a LiteralInt[…]. So some tests use a pattern like this:

def int_instance() -> int: ...

x = int_instance()  # to create x: int

To create a variable with a union type, a lot of tests follow a pattern like

x = a if flag or b  # to create x: A | B

It's unclear to me if this really requires any action, but I thought it might make sense to discuss this in a bit more detail. Here are some approaches (some less generic than others) that I could think of.

1. `def f() -> MyDesiredType; x = f()`

Upsides:

You can specify MyDesiredType directly

Downsides:

Does not work for union types (yet).
Rather verbose, requires coming up with yet another name (for the function)
Does not work at runtime (i.e. you can't just paste this into an interpreter and see what the runtime type is).

2. `def f(x: MyDesiredType): …`

Upsides:

You can specify MyDesiredType directly
Fewer lines than the pattern above

Downsides:

Does not work yet
All tests are now within a function, requires coming up with a name for the function
Does not work at runtime (see above)

3. `a if flag or b`

(only relevant for union types)

Upsides:

Does work at runtime, if you setup flag beforehand

Downsides:

Can't see the result type in code
Relies on an undefined variable, which leads to extraneous diagnostic outputs (e.g. if you play with these snippets interactively, or if you want to see what other type checkers do). Maybe this could be prevented by injecting flag into the test environment somehow.
Arguably not very beginner-friendly (what is flag?!)

4. Helper functions like `one_of(a, b)`

We could inject new functions, just for testing purposes. For example, we might have a function similar to

def one_of[A, B](a: A, b: B) -> A | B:
    if random.randint(0, 1):
        return x
    else:
        return y

to easily create union types

Upsides(?):

x = one_of(1, None) is slightly more readable than x = 1 if flag else None (but only if you know what one_of does)
Works at runtime, but requires having the helper functions available

Downsides:

Can't see the result type in code
The snippets are not self-contained anymore. You can not simply copy a snippet and try it out in the mypy playground, for example (unless you have a copy of the test utilities in there already).
Requires beginners to learn about these utilities

5. A magic `conjure` function

I'm not even sure if this is technically possible, but other languages have ways to create values of type T out of nothing. Not actually, of course. But for the purpose of doing interesting things at "type check time". For example, C++ has std::declval<T>(). Rust has let x: T = todo!(). Functional languages have absurd :: ⊥-> T.

You can't specify explicit generic parameters in a function call in Python (?), so we couldn't do something like x = conjure[int | None](), but maybe there is some way to create a construct conceptually similar to

def conjure[T]() -> T: ...  # Python type checkers don't like this

I think I would personally prefer the simple def f(x: MyDesiredType): … approach, once we make that work.

AlexWaygood commented 1 month ago

You can't specify explicit generic parameters in a function call in Python (?), so we couldn't do something like x = conjure[int | None]()

Correct -- for now, the best you can do is x: int | None = conjure() or x = typing.cast(int | None, conjure()). But see https://peps.python.org/pep-0718/ for a proposal to change this. (Discussed at https://discuss.python.org/t/pep-718-subscriptable-functions/28457.)

AlexWaygood commented 1 month ago

(I edited your post to number your suggestions so it would be easier to discuss them, hope that's okay :-)

Your proposals (4) and (5) both involve injecting some sort of "magic" function into the namespace that we could use without any imports, which would then create instances of the types required for the test. My first instinct was that I didn't much like the idea, because in general I'd like the test snippets to be as close as possible to executable Python code. I think it's useful to keep a close resemblance between our test snippets and user code we'll actually be running on. I also think keeping our test snippets as close as possible to executable Python makes them much easier for us and external contributors to understand.

However, I then realised that this isn't really that different to what we already do with reveal_type. At runtime, reveal_type is not a builtin -- you have to import it from typing or typing_extensions if you want to use it in such a way that your code will not crash when you actually run your code with a Python interpreter. But we pretend it's a builtin, so that users can easily debug their type-checking results without having to add an import, and so that we can keep our test snippets concise. I argued against this when we were designing the test framework (I said we should have to explicitly import reveal_type in order to use it in test snippets), but @carljm pushed for it, partly on the grounds that it would significantly reduce the boilerplate of our tests. In retrospect, I think he was probably right; it would be a bit of a pain to have to import reveal_type in every test snippet.

The key differences with reveal_type are:

Other type checkers also pretend that reveal_type is a builtin. If you want to do a cross-comparison of a red-knot test with how mypy or pyright infer the types, you can just copy and paste it into their playgrounds currently. But if we injected a magic one_of or conjure function, we'd have to remember to add those function definitions to the snippet before mypy or pyright would accept them.
We'd only be injecting one_of or conjure into the namespace of test snippets, whereas for reveal_type we also pretend it's a builtin when checking user code.

I think I would personally prefer the simple def f(x: MyDesiredType): … approach, once we make that work.

Yes, I think I agree. Mypy has quite an extensive test suite that works in a similar way to our new framework, and they've managed to do without a conjure() function or one_of(). That doesn't mean that the idea is bad, of course! But it does suggest that it should be possible to do without it. And even if our test snippets already don't look exactly the same as executable Python code would (due to all the unimported reveal_type usages), it's nice to limit the differences as much as possible.

One way that mypy test snippets do differ from executable Python is in their use of "fixture stubs". Rather than using their full vendored stdlib typeshed stubs (which is what they use for checking user code), in their tests they use a radically simplified version of typeshed. This speeds up their tests a lot, but it is very frequently a source of confusion for mypy developers and contributors, who often think they've fixed a bug only to realise that the type inference their users are seeing for standard-library functions is very different to the type inference they thought they had asserted in their test snippets.

carljm commented 1 month ago

Just for the sake of discussion, another possibility here is to allow "layering" files, so in a Markdown header section you can provide a file that will be shared by all sub-tests within that section. So this would let you write your own little utilities (or simply type-annotated variables in a stub file) and import/reuse them in a bunch of related tests. Downsides are less locality of tests, and more complexity in understanding the structure and behavior of a test.

I also don't want to do anything here that's specifically motivated by limitations we should lift soon, like not understanding function arguments, or unions in annotations.

I think on the whole my preference is also defining functions with typed arguments, in most cases.

astral-sh / ruff