externref/funcref distinction fundamental for toolchain?

wingo commented 4 years ago

Hello,

Going to implement ref.null support in LLVM, I ran into an interesting issue. The summary is that I think it makes sense to define an LLVM-specific top type encompassing both externref and funcref; an anyref, if you will.

Concretely, ref.null needs a type operand. The instruction is specified as being ref.null REFTYPE (https://webassembly.github.io/reference-types/core/syntax/instructions.html#reference-instructions), and encodes as such in the binary.

Currently REFTYPE can only be externref or funcref. However with typed function references, the set becomes unbounded, as users can define their own e.g. (func (i32 i32) -> (i64 f32)) and similar. So at least on the MC layer we will need for ref.null to have a reftype operand. Therefore I will probably make a Reftype operand kind, which is similar in a way to the Signature operand to block instructions.

To compare, the approach taken in the implementation of the table instructions was to provide e.g. TABLE.GET_externref for tables returning externref, and TABLE.GET_funcref for those returning funcref. Given the signature operand to ref.null though, it is not necessary from a target point of view to have two kinds of ref.null.

Which leads me to my proposal: what good does it do us in LLVM to distinguish externref and funcref values as different MachineValueTypes? It's not sufficient to provide the information needed to ref.null, and yet not necessary for instructions like table.get. It would be simpler if we could just treat all reference types the same.

In the case of table.get and similar instructions, it turns out that discriminating between externref and funcref is not necessary for the target encoding; the result of table.get is the type of the table. We could remove the duplicate instruction definitions, and define table.get as just returning a value of type anyref.

If this analysis is right, we should replace the externref and funcref MVT's with one anyref. If the difference is important for the instruction encoding, the instruction will have to take a Reftype operand. I will work up a patch.

One question is, how do we represent ref.null on the IR level. Given that the set of types is unbounded (once we have typed function references), a quick-and-dirty way would be to define the intrinsic as anyref __builtin_wasm_ref_null(const char *type), and pass either externref or funcref as immediate strings. This is just a placeholder idea, I guess.

For context, it used to be that there was just anyref in the reference-types proposal, but it was later changed to externref and funcref. This was essentially for run-time concerns, AFAIU: you might want to represent function references and GC objects differently, and that forcing a top type onto them constrains run-time in undesirable ways. I get that. But for the compiler, it doesn't seem to me like the difference buys us anything.

Cc @tlively @sbc100 @pmatos. If this discussion might be better elsewhere, happy to take it there :)

tlively commented 4 years ago

Hi @wingo!

Given the signature operand to ref.null though, it is not necessary from a target point of view to have two kinds of ref.null.

We generally have not worried about declaring extra instructions in the backend that are not strictly necessary. In fact, if we were going for true minimality, we could just have a single register class called VAL and deduplicate all the instructions that are identical except for the register class of their arguments or results. This would be possible because the stacky version of the instructions that actually makes it to the MC layer doesn't use register operands at all, so the difference between all the register classes is erased. The reason we don't do this is because we get simple type validation for free from the MachineInstrVerifier when we use a separate register class for each value type. Unless we want to overhaul this system entirely and move to just a single register class, I would continue treating funcref and externref separately and generate a separate version of ref.null for each of them, even if it's redundant.

wingo commented 4 years ago

Sure, can do. Hard to know when to treat things the same and when to treat them as different. I will do the two-instructions thing.

I still think we are making an error differentiating between the two, though. Once there are more types, so that there's no useful way in which e.g. two "funcref" values can be treated the same because we will be reasoning about different concrete types, the utility of the externref/funcref split seems minimal to my (ignorant!) eyes. It's as if there were an MVT for (linear-memory) pointer to struct and another for pointer to class; "weird flex, but ok" ;-) I guess we will find out later!

WebAssembly / tool-conventions

externref/funcref distinction fundamental for toolchain? #150