carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.24k stars 1.48k forks source link

Add an instruction to represent a use of a dependent value from a generic instance. #4122

Closed zygoloid closed 2 months ago

zygoloid commented 2 months ago

We can't use the instruction from the generic directly, because it doesn't have the right constant value. Instead add an instruction that models the transition from the constant value in the generic to the constant value in the generic instance.

Also start associating the self generic instance with unqualified lookups that find results in an enclosing generic, so that we track the information necessary to create the new instruction.

zygoloid commented 2 months ago

Also, I think I'm missing something: why is the instance associated into an instruction after name lookup, rather than the instruction in name lookup having the instance id associated? That is, I think I'd understand the current InstanceConstant approach better if it were what's in name lookup, versus being created per name lookup.

Per toolchain discussion on 2024-06-25, we model generic instances as a minimal overlay over the generic, rather than a separate new entity. That means that we don't create a new name lookup table / name scope per instance, and we don't create instance-specific versions of each (dependent) instruction within a generic. There is no instruction representing the version of an instruction from a generic, with the constant value / type taken from a specific instance; that's what the new InstanceConstant is representing.

We could generate the InstanceConstants eagerly as part of forming the generic instance, or at least cache them on the generic instance. However, that would mean that we'd need to treat either NameRefs or InstanceConstants that appear in a generic as a special case when forming instances of that generic, because we need to substitute into the arguments that appear in the argument list of the generic instance in the InstanceConstant when substituting into the generic. For example:

class A(T:! type) { var x: T; }
class B(T:! type) { extend base: A(T*); }
fn F[U:! type](b: B(U)) -> T* { return b.x; }

The generic F has a name reference to x in A(U*). When forming an instance of F, say with U = i32, we want to remap that reference to A(U*).x into a reference to A(i32*).x. Normally, forming an instance just means substituting throughout the dependent instructions within F, so this works out if the InstanceConstant that picks the A(U*) version of x is within F. But if we store / cache the InstanceConstant with the GenericInstance, then we'll somehow need to know when transforming the NameRef that we need to pick a different instance of x.

So I think putting the reference to the generic instance with the code that uses the name x is the simpler choice.

Looking at this, had you considered a GenericNameRef? Right now, you create 2 instructions for a NameRef that involves a generic, or 32 bytes. We could also have a GenericNameRef that stores the inst_id+instance_id in a side table, or 24 bytes per name reference (in this example, name_id + index as the 2 instruction args).

I'm wondering about this mainly because I expect references to generics to be a common pattern, so shaving the size of references might have significant impact.

Yeah, this may be a good thing to explore. There are some other dimensions here: for a member name reference, it'd be good for the SemIR to mention the base expression in which we accessed the member, too. For what it's worth, I think the overhead here is even a little higher than you're suggesting: there's also 4 bytes for the constant ID and 4 bytes for the node ID, so I think it's 20 bytes extra for the second instruction versus maybe 8 bytes per instruction in a side table. On the other hand, there's some maintenance and code complexity costs in the side table, plus worse locality for SemIR traversal.

I'll add a TODO to explore this.