Closed wingo closed 4 years ago
Hi @wingo!
Given the signature operand to ref.null though, it is not necessary from a target point of view to have two kinds of ref.null.
We generally have not worried about declaring extra instructions in the backend that are not strictly necessary. In fact, if we were going for true minimality, we could just have a single register class called VAL
and deduplicate all the instructions that are identical except for the register class of their arguments or results. This would be possible because the stacky version of the instructions that actually makes it to the MC layer doesn't use register operands at all, so the difference between all the register classes is erased. The reason we don't do this is because we get simple type validation for free from the MachineInstrVerifier when we use a separate register class for each value type. Unless we want to overhaul this system entirely and move to just a single register class, I would continue treating funcref and externref separately and generate a separate version of ref.null
for each of them, even if it's redundant.
Sure, can do. Hard to know when to treat things the same and when to treat them as different. I will do the two-instructions thing.
I still think we are making an error differentiating between the two, though. Once there are more types, so that there's no useful way in which e.g. two "funcref" values can be treated the same because we will be reasoning about different concrete types, the utility of the externref/funcref split seems minimal to my (ignorant!) eyes. It's as if there were an MVT for (linear-memory) pointer to struct and another for pointer to class; "weird flex, but ok" ;-) I guess we will find out later!
Hello,
Going to implement
ref.null
support in LLVM, I ran into an interesting issue. The summary is that I think it makes sense to define an LLVM-specific top type encompassing bothexternref
andfuncref
; ananyref
, if you will.Concretely,
ref.null
needs a type operand. The instruction is specified as beingref.null REFTYPE
(https://webassembly.github.io/reference-types/core/syntax/instructions.html#reference-instructions), and encodes as such in the binary.Currently
REFTYPE
can only beexternref
orfuncref
. However with typed function references, the set becomes unbounded, as users can define their own e.g.(func (i32 i32) -> (i64 f32))
and similar. So at least on the MC layer we will need forref.null
to have a reftype operand. Therefore I will probably make aReftype
operand kind, which is similar in a way to theSignature
operand to block instructions.To compare, the approach taken in the implementation of the table instructions was to provide e.g.
TABLE.GET_externref
for tables returning externref, andTABLE.GET_funcref
for those returning funcref. Given the signature operand toref.null
though, it is not necessary from a target point of view to have two kinds ofref.null
.Which leads me to my proposal: what good does it do us in LLVM to distinguish
externref
andfuncref
values as differentMachineValueType
s? It's not sufficient to provide the information needed toref.null
, and yet not necessary for instructions liketable.get
. It would be simpler if we could just treat all reference types the same.In the case of
table.get
and similar instructions, it turns out that discriminating betweenexternref
andfuncref
is not necessary for the target encoding; the result oftable.get
is the type of the table. We could remove the duplicate instruction definitions, and definetable.get
as just returning a value of typeanyref
.If this analysis is right, we should replace the
externref
andfuncref
MVT's with oneanyref
. If the difference is important for the instruction encoding, the instruction will have to take aReftype
operand. I will work up a patch.One question is, how do we represent
ref.null
on the IR level. Given that the set of types is unbounded (once we have typed function references), a quick-and-dirty way would be to define the intrinsic asanyref __builtin_wasm_ref_null(const char *type)
, and pass eitherexternref
orfuncref
as immediate strings. This is just a placeholder idea, I guess.For context, it used to be that there was just
anyref
in the reference-types proposal, but it was later changed toexternref
andfuncref
. This was essentially for run-time concerns, AFAIU: you might want to represent function references and GC objects differently, and that forcing a top type onto them constrains run-time in undesirable ways. I get that. But for the compiler, it doesn't seem to me like the difference buys us anything.Cc @tlively @sbc100 @pmatos. If this discussion might be better elsewhere, happy to take it there :)