astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
30.71k stars 1.02k forks source link

[red-knot] special-case import of built-in modules #12398

Closed carljm closed 1 month ago

carljm commented 1 month ago

Certain modules are imported by BuiltinImporter, which comes first in sys.meta_path by default (before PathFinder which implements the normal filesystem-based module resolution.) These modules will always take precedence over any module on the filesystem (i.e. first-party code can't ever shadow these modules, unless somebody starts rearranging sys.meta_path, which throws everything out the window and we can't reasonably support.)

So our module resolver should special-case these modules to always resolve to typeshed, and never to user code. The module names are listed at https://github.com/python/cpython/blob/main/Modules/config.c.in#L35 -- currently they are marshal, _imp, _ast, _tokenize, builtins, sys, gc, _warnings, _string.

(I also verified experimentally that I can't override these modules by creating e.g. a gc.py or sys.py or builtins.py file.)

AlexWaygood commented 1 month ago

The module names are listed at python/cpython@main/Modules/config.c.in#L35 -- currently they are marshal, _imp, _ast, _tokenize, builtins, sys, gc, _warnings, _string.

This isn't a complete list of the modules that are special-cased by the interpreter; I think there must be additional special-casing done elsewhere. A more complete list can be obtained using the sys.builtin_module_names constant. Experimenting locally confirms that there are names included in that list which are not included in your list and are special-cased in the same way when it comes to module resolution.

We should add special-casing for these modules by writing a Python script to generate a Rust function similar to the one in https://github.com/astral-sh/ruff/blob/main/crates/ruff_python_stdlib/src/sys.rs, which can be queried to determine which modules are considered builtin on any given Python version.

MichaReiser commented 1 month ago

It might be worth double-checking the list of paths with the paths that Pyright uses.

AlexWaygood commented 1 month ago

It might be worth double-checking the list of paths with the paths that Pyright uses.

From what I can tell, I'm not sure pyright ever respects first-party modules overriding stdlib modules. I can't find any relevant list of paths in the pyright source code; and from local experimentation, pyright appears to resolve random to the stdlib module even if it's locally overridden (despite emitting a warning telling you that the local random.py file is shadowing a stdlib module).

I have high confidence that the list of modules given by sys.builtin_module_names is the correct one, however.

carljm commented 1 month ago

Yeah, sys.builtin_module_names is the right complete list. It's the same special-casing (BuiltinImporter), it's just that the file I linked is the input template to code generation that happens at CPython build time, resulting in an output config.c module that has more built-in modules listed in the inittab. (This is because some built-in modules are conditionally built.)

AlexWaygood commented 1 month ago

Status update: https://github.com/astral-sh/ruff/commit/d8cf8ac2ef26bb630b43b095f61662173b2bac2f just added a hardcoded list directly into resolver.rs, based on the builtin modules that I have locally on Python 3.12. We should still change this to be a generated Rust function similar to the one in https://github.com/astral-sh/ruff/blob/main/crates/ruff_python_stdlib/src/sys.rs.