Open aardappel opened 3 years ago
Interesting points.
To me, the former rather seems to suggest introducing a iNN.nez
instruction instead of duplicating every bool-consuming instruction. In fact, I remember earlier complaints that Wasm does not have such an instruction. That seems enough of a no-brainer.
The latter seems to suggest that wasm64 should include i64-indexed table types as well -- otherwise a call_indirect with i64 operand would be rather incoherent, and moreover, it might not be the only table instruction one might want to use with i64 index, once that type is used for table indices.
That probably would easy to design is well, and in principle it'd nicer to ship a coherent wasm64 package. But it is somewhat harder to motivate at this point and probably a significant implementation effort.
@rossberg
iNN.nez
would indeed be welcome, as that double eqz
sequence is very odd.. and would be easier to ensure for even simpler engines that it becomes a no-op (or close to).
Yet, at the same time, replacing 2 ops with 1, and still making wasm64 fundamentally less equal than wasm32 seems inelegant to me. You say "duplicating every bool-consuming instruction".. there's just 3 of them. Would you say duplicating is likely better than polymorphizing the existing ones?
Certainly, adding new control flow instructions that have their own encoding is a lot bigger churn on tools & engine code, but I could imagine a polymorphic instruction has consequences for the validator.. then again we just made all memory ops polymorphic already.
It wouldn't be polymorphic (= work for all types of a certain shape), though, it would be overloaded (= work for a special finite set of types). Wasm has intentionally tried to hold the line and avoid overloading.
And if you duplicate instructions, will there only be 3 such instructions forever? Will it be only two integer/pointer types forever? Is it consistent to have i64-bool consuming functions but not i64-bool producing functions (ref.is_null etc)? This would be a fairly odd design that scales poorly. For better or worse, Wasm uses i32 as its bool type. (Originally we even discussed introducing a separate bool type, in which case such shortcuts wouldn't be possible either.)
Additionally, br_table
privileges i32. In a system with tagged i64 addresses, br_table (ptr & MASK)
may be a thing for overloaded operations.
@rossberg
Will it be only two integer/pointer types forever?
Linear memory pointer types, yes :) Are you seriously arguing that we shouldn't extend support for 64-bit pointers because one day we may have 128-bit pointers??
but not i64-bool producing functions (ref.is_null etc)?
ref.is_null
doesn't produce a pointer, so is fine to have its output consumed by the existing 32-bit version.
For better or worse, Wasm uses i32 as its bool type.
Wasm doesn't have a bool type, it has bool instructions, that can operate on anything i32 sized, including pointers.
@lars-t-hansen
br_table privileges i32
But it doesn't operate on pointers directly. The fact that its input can possibly originate from bits inside a pointer doesn't sound quite as directly applicable as the null-checks in my original post.
Are you seriously arguing that we shouldn't extend support for 64-bit pointers because one day we may have 128-bit pointers??
I am arguing that an instruction set preferably is regular. If some instruction is available for more than one int type, then I'd like it to work for all available int types.
(As an aside, I also wouldn't be totally surprised if we get i16 (or even wasm16) some day, despite all intentions to the contrary.)
ref.is_null
doesn't produce a pointer, so is fine to have its output consumed by the existing 32-bit version.
A compiler may have good reasons to use "word size" for bools in various circumstances. In wasm64 that would mean 64 bit bools. By the same argument you bring up above, it would hence take 64 bit versions of bool-producing instructions to bring wasm64 "on equal footing". So, if that's the goal, then we ought to consistently support other sizes for all bool ops.
I also wouldn't be totally surprised if we get i16 (or even wasm16) some day, despite all intentions to the contrary.
If that ever happens (which I find exceedingly unlikely) then the cost to make similar changes as we are making here for wasm64 don't seem that onerous to me. I think we can agree that 64-bit pointers are endlessly more important for this world than 16 or 128-bit ones at this point?
A compiler may have good reasons to use "word size" for bools in various circumstances. In wasm64 that would mean 64 bit bools. By the same argument you bring up above, it would hence take 64 bit versions of bool-producing instructions to bring wasm64 "on equal footing". So, if that's the goal, then we ought to consistently support other sizes for all bool ops.
I would say that the concern of being able to represent a bool type as an i64
is completely separate from the question of wether we should be able to test an existing i64 conveniently as a bool. Like I said, for ref.is_null
, if it is tested directly, there is no issue. The only issue might be if you want to store the result of ref.is_null
in a local or in memory as an i64 because the language wants to interpret all bools as i64
and wants to operate on this value before it is tested. I find that a lot more far fetched than the simple need to test pointers for null.
More generally, you seem to take an all or nothing approach around "what is a bool?" and want to disallow very frequently useful instruction sequences if we don't do it for all possible such sequences that relate to bools. This seems counter to the idea that Wasm is an implementation substrate, not a language. How to represent a bool or a pointer is a concern of the language mapping to Wasm, and we should simply give them the tools to do so. We are already very biased in what concepts we allow a language to map directly, so being pragmatic and enabling just a few common sequences for 64-bit pointers doesn't seem that crazy to me.
I think iNN.nez
is pretty uncontroversial, so we should consider that separately. It may be worth including in the memory64 proposal, but we may want to include it with some other integer instructions.
As for the rest (overloading br_if
, if
, select
), I'd be interested to see some data -- how often this occurs, and how many extra bytes we're talking about. Code size is important, but so is producer and consumer complexity. So going back to your initial options:
- It could be added to the current proposal because it is a minor change.
- It should be a follow-up proposal because it would endanger Memory64 to be delayed.
- It's not worth it and is not needed at all.
I'd go with 2, unless we can show value now.
I'd be interested to see some data -- how often this occurs
It's hardly possible to produce fair data about this at this point. Maybe some sample binaries we have right now don't use it at all, and some future language that depends on null-pointer checks has them absolutely everywhere.
I am going more by principle, that something as simple as a null-pointer check should not need 1/2 extra instructions compared to the wasm32
version, just because we designed Wasm around 32-bit everything and we don't care to make 64-bit a symmetric extension.
But sure, we can wait until the MVP lands, and someone actually needs this. It's more important that the basics of this proposal land first, the problem in this issue will not hold anyone back fundamentally.
It was an internal security routine I did with this issue an random double word identification code
There are a number of operations in Wasm that are
i32
only, mainly because they are deemed booleans, primarily among themif
,br_if
andselect
. Because pointers were 32-bit, and testing a null-pointer is a very frequent operation, these could be used with pointers.But so far in this proposal we do not have this functionality. 32-bit code like:
Now has to be emitted as:
Or (1 more byte):
Though I'm sure this has no consequences in most engines, it is extra code size, and if we want Memory64 to be on equal footing, I think it be worth considering 64-bit versions of these instructions (or making them polymorphic).
More minor concerns being that it requires special casing in producers, and is just less readable.
It may not be super frequent in C/C++/Rust code, but imagine higher level languages that emit null checks on each memory access.
We already have code in Binaryen that emits extra truncates for checking alignment, for example.
Another instruction to consider is
call_indirect
that always takes ani32
index. Thing is, that index is what many producers will consider a function pointer. For example, we already have code in LLVM to emit truncates here, because we store these indices in ani64
for uniformity with other kinds of pointers in the wasm64 target. This is less likely to be performance sensitive than the above example though, but could be considered out of uniformity.Question is, when?
Opinions?