astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
28.59k stars 926 forks source link

[red-knot] intern types separately from type inference result #12061

Open carljm opened 4 days ago

carljm commented 4 days ago

Initial exploration of what it might look like to intern types independent of the type inference result, so the granularity of type inference and the granularity of type interning don't have to be coupled.

The goal of this is to make it possible to add definition_ty queries, but this PR doesn't do that yet.

This PR also doesn't work yet :) Just for illustration, I'm using Arc::get_mut in the places where we need to mutate the FileTypeStore, but we can't actually do this because Salsa is still holding on to a reference to the Arc also. To make this work, we'll need to introduce interior mutability to the FileTypeStore, which is a bit tricky along with wanting to return references to type structs stored inside it. The old TypeStore in the old red_knot crate shows one approach to this, but I'm not sure it's the best option.

Having the FileTypeStore returned by a salsa query at all might require the Salsa "mutable arena" feature discussed in https://github.com/salsa-rs/salsa/pull/490#discussion_r1641000198 -- if so, for now we could instead just keep our our own dashmap of VfsFile to FileTypeStore on the Db, but not actually in Salsa.

Some nice things about interning types ourselves: a lot fewer lifetimes (no lifetime on Type enum anymore), and we can eliminate the TypingContext indirection layer and just use Db in all APIs.

MichaReiser commented 3 days ago

As reference, this is how rustc interns types:

The data type:

https://github.com/rust-lang/rust/blob/42add88d2275b95c98e512ab680436ede691e853/compiler/rustc_middle/src/ty/adt.rs#L97-L106

A reference to a type (bound to the TyCtxt lifetime)

https://github.com/rust-lang/rust/blob/42add88d2275b95c98e512ab680436ede691e853/compiler/rustc_middle/src/ty/adt.rs#L172-L174

The definition of Interned

https://github.com/rust-lang/rust/blob/42add88d2275b95c98e512ab680436ede691e853/compiler/rustc_data_structures/src/intern.rs#L13-L25

And this is used in TyCtxt which is rustcs arena where it interns all data.

https://github.com/rust-lang/rust/blob/42add88d2275b95c98e512ab680436ede691e853/compiler/rustc_middle/src/ty/context.rs#L194-L197

MichaReiser commented 3 days ago

I do like splitting out the type internet from the type inference result. I think it makes sense to have them separate and also allows more effective reuse.

The part that's unclear to me and that you already mentioned on your write up is how to collect unused types. How can we prevent that this is an ever-growing arena?

We could try to track type references by using an AtomicUsize (not an Arc because that would heap allocate each type!) It would give us very fine-grained reclamation of types, at least if we use a slab like arena that supports reusing IDs. An other possibility could be to rely on Salsa by using one interner per file, package or some other granularity, and the interner would be retrieved by calling a salsa-query. This won't give us type level collection, but salsa would evict the entire type store if e.g. the file gets deleted (or all files in a package).