Closed sifmelcara closed 5 years ago
Our current plan is to keep the IR generator producing strict assembly for now, because the EVM is still our main backend and because the optimizer is built for this dialect. We want to create a prototype translator for yet another dialect of Yul (#6547) to web assembly. To complete the transformation sequence, a third translation from strict assembly to this restricted wasm-targeted yul dialect would be needed.
If you want, you could work on this third transformation, but please keep in mind that the restricted wasm dialect is still subject to change and does not yet define the blockchain-related access functions.
Some questions:
I just found out that some of my questions have already been answered in #6274. So the strict Yul described in the document is planned to be implemented in the future, but not now. Am I correct?
So here is my imagination of how will the MVP implementation look like:
(A)
+----------+
| Solidity |
+----+-----+
|
| libsolidity/codegen/ir/
(B) v
+----+------------+
| strict assembly |
| (EVMDialect) |
+----+------------+
|
| type labeler (to be implemented)
|
(C) v
+----+--------------+
| Yul (EVMDialect) |
| All types are |
| still u256 |
+----+--------------+
|
| Add WasmDialect builtin function
| implementation of EVM opcodes to
| the yul source code.
|
(D) v
+----+------------------+
| Yul (WasmDialect) |
| The code now contains |
| u32, u64, and u256 |
+----+------------------+
|
| Replace identifiers that have
| type u256 with four u64 identifiers
|
(E) v
+----+------------------+
| Yul (WasmDialect) |
| All types are u32/u64 |
+----+------------------+
|
| libyul/backends/wasm/
|
v
+----+----+
| eWasm |
+---------+
I'm not sure if this "type labeler" is necessary (to be a separate transformation stage) at this moment. It translates strict assembly to Yul so I guess maybe this can also be useful for solidity to EVM pipeline in the future.
We append our own implementation of EVM opcodes to the yul source code. After this step, we don't need to parse EVM opcodes as FunctionalInstruction
anymore.
Here are two examples: (I'm not familiar with wasm, please forgive me if I get something wrong)
function add(x: u256, y: u256) -> result: u256
{
let x1, x2, x3, x4 := unpacku256(x)
let y1, y2, y3, y4 := unpacku256(y)
let r1 := i64.add(x1, y1)
let r2 := i64.add(x2, y2)
let r3 := i64.add(x3, y3)
let r4 := i64.add(x4, y4)
// ...
// dealing with carrying
// ...
result := packu256(r1, r2, r3, r4)
}
function mload(location: u256) -> result: u256
{
let numCurrentMemoryPage := memory.size()
if i32.le_u(
i32.mul(numCurrentMemoryPage, i32.mul(64: u32, 1024: u32)) // page size 64k
i32.add(location, 32: u32) // MLOAD loads 32 bytes at once
)
{
memory.grow(...)
}
let location32 := u256tou32(location)
let r1 := i64.load(i32.add(location32, 0: u32), 5: u32)
let r2 := i64.load(i32.add(location32, 8: u32), 5: u32)
let r3 := i64.load(i32.add(location32, 16: u32), 5: u32)
let r4 := i64.load(i32.add(location32, 24: u32), 5: u32)
result := packu256(r1, r2, r3, r4)
}
Now only type conversion functions (u256tou32
, u256tou64
) and u256 <---> (u64, u64, u64, u64)
conversion functions (packu256
, unpacku256
) revolve around u256
data type. The usage of these functions can be removed at the transformation from (D) to (E).
We transform something like
function f(a: u64, b: u256, c: u64) -> result1 := u256, result2 := u64 {}
let x, y := f(u, v, s)
to
function f(a: u64, b_1: u64, b_2: u64, b_3: u64, b_4: u64, c: u64) -> result1_1: u64, result1_2: u64, result1_3: u64, result1_4: u64, result2 := u64 {}
let x_1, x_2, x_3, x_4, y := f(u, v_1, v_2, v_3, v_4, s)
Thus packu256
and unpacku256
can be removed.
Oh, and FWIW, EVM is big-endian while wasm is little-endian. We need to be careful.
@sifmelcara yep, that's exactly how I imagined it! Note that we might introduce (semantic) types already at the IR generation stage, for example to distinguish between a memory pointer, a storage pointer and a "regular" number. There should be an issue about that already.
I'll try to finish multi-returns for wasm today, so you could already work on (C) -> (D) already, even though without types (i.e. assuming everything is u256 before the transform and u64 after the transform).
I decide to exchange the order of (C) -> (D) and (D) -> (E) pipeline, by doing this, we don't even need packu256
, unpacku256
and type conversion functions anymore.
I remember there was a discussion about whether we want to allow the following code to work, but I can't find the discussion anymore
{
function foo() -> x1, x2 { }
function bar(x1, x2, x3, x4) { }
function main()
{
bar(1, foo(), 2)
}
}
Any ideas? Maybe I can assume input is always pseudo-SSA to side-step this complicate case of transformation though.
Currently the only place where a function that returns multiple values can be used is in a variable declaration or an assignment.
We should probably take a look at some more examples to see whether replacing EVM functions by eWasm functions or removing large types makes more sense to do first. Depending on what is done first, we either need multi-variable versions of EVM opcodes or large-type versions of eWasm functions...
Depending on what is done first, we either need multi-variable versions of EVM opcodes or large-type versions of eWasm functions
I think the difference is how "ewasm's implementation of Functional Instruction" 's interface will look like: add(u256, u256) -> u256
v.s add(u64, u64, u64, u64, u64, u64, u64, u64) -> u64, u64, u64, u64
.
So... the question is, how can we convert the following code's type from u256 to four u64?
{
function foo() -> r: u256 { }
function bar(x: u256) -> b:u256 { }
function main()
{
for {} bar(foo()) {} {}
}
}
If we simply replace u256 by four u64, the code becomes:
{
function foo() -> r: u64, u64, u64, u64 { }
function bar(x1: u64, x2: u64, x3: u64, x4: u64) -> b1: u64, b2: u64, b3: u64, b4: u64 { }
function main()
{
for {} bar(foo()) {} {}
}
}
which is not valid.
A lazy solution would be to apply #6480 and expression splitter before this transformation...
Yes indeed, we need the expression splitter! Also my hope is that we can have a transform that removes some of those variables if we can show that the values before the transformation are small enough (I think we also have an issue about that).
Also my hope is that we can have a transform that removes some of those variables if we can show that the values before the transformation are small enough
I agree! I think this is kind of https://en.wikipedia.org/wiki/Value_range_analysis and should be part of the optimization phases.
I have finished the u256 to four u64 transformation (except If
and Switch
, and currently assume we have #6480 and expression splitter). Currently it feels like
$ ./build/test/tools/yulopti code.yul
----------------------
{
function swap(x, y) -> a, b
{
a := y
b := x
}
function main(v1, v2) -> r1, r2
{
let tmp := 7
v1 := 9
r1, r2 := swap(v1, tmp)
r1, r2 := swap(42, r2)
}
}
(q)quit/(f)flatten/(c)se/initialize var(d)ecls/(x)plit/(j)oin/(g)rouper/(h)oister/
(e)xpr inline/(i)nline/(s)implify/varname c(l)eaner/(u)nusedprune/ss(a) transform/
(r)edundant assign elim./re(m)aterializer/f(o)r-loop-pre-rewriter/
s(t)ructural simplifier/equi(v)alent function combiner/ssa re(V)erser/?
stack com(p)ressor/(D)ead code eliminator/(w)ord size transformer/?
w
----------------------
{
function swap(x_1, x_2, x_3, x_4, y_5, y_6, y_7, y_8) -> a_9, a_10, a_11, a_12, b_13, b_14, b_15, b_16
{
a_9 := y_5
a_10 := y_6
a_11 := y_7
a_12 := y_8
b_13 := x_1
b_14 := x_2
b_15 := x_3
b_16 := x_4
}
function main(v1_17, v1_18, v1_19, v1_20, v2_21, v2_22, v2_23, v2_24) -> r1_25, r1_26, r1_27, r1_28, r2_29, r2_30, r2_31, r2_32
{
let tmp_33 := 0
let tmp_34 := 0
let tmp_35 := 0
let tmp_36 := 7
v1_17 := 0
v1_18 := 0
v1_19 := 0
v1_20 := 9
r1_25, r1_26, r1_27, r1_28, r2_29, r2_30, r2_31, r2_32 := swap(v1_17, v1_18, v1_19, v1_20, tmp_33, tmp_34, tmp_35, tmp_36)
r1_25, r1_26, r1_27, r1_28, r2_29, r2_30, r2_31, r2_32 := swap(0, 0, 0, 42, r2_29, r2_30, r2_31, r2_32)
}
}
A Switch
can be implemented by 4-layer nesting Switch
. And If
can be translated to Switch
. (But I'm a little bit afraid that the code size will blow up...)
Looks great! Can you create a pull request for that?
Hm, I have to admit that I haven't thought about switch, that is a bit nasty.
If should be rather easy, though, you can use if x
-> if or(or(x_1, x_2), or(x_3, x_4))
. If we had real types, this would not be a problem since a bool
always fits 64 bits :)
Looks great! Can you create a pull request for that?
This work apparently depends on #6547 so I planned to open PR after #6547 merge. (Just to avoid the need to force push again)
you can use if x -> if or(or(x_1, x_2), or(x_3, x_4))
Oh! I actually can do this. Previously I was thinking that this will cause trouble because both EVM and eWasm have or
. But actually one is FunctionalInstruction
and the other is FunctionCall
. They are different in memory representation of yul AST.
Don't have any false illusions about not having to force-push :)
Corncerning the or: What is the set of builtin functions you want to use after splitting up the variables? You do need something in between don't you? Ah, and after https://github.com/ethereum/solidity/pull/6579 we probably plan to rename the wasm functions to i64.add
(just how they are called in the wasm spec) - so you could actually use a hybrid dialect that defines both sets of functions...
What is the set of builtin functions you want to use after splitting up the variables?
Probably only i64.or
for If
?
But doesn't that mean that you have to do both steps (splitting variables and changing functions) at the same time?
Hmm... Sorry I don't understand your question. After applying this WordSizeTransform
, the tasks left is to implement builtins like add(u64, u64, u64, u64, u64, u64, u64, u64) -> u64, u64, u64, u64
in WasmDialect
. Am I correct?
the tasks left is to implement builtins like
add(u64, u64, u64, u64, u64, u64, u64, u64) -> u64, u64, u64, u64
inWasmDialect
.
(This assumes that we want to support both add(u64, u64, u64, u64, u64, u64, u64, u64) -> u64, u64, u64, u64
and i64.add()
as builtin)
I think I lost some context here. I'll just wait what you come up with, if you don't mind :)
Sorry that I somewhat messed up the discussion. Here is the current planning in my mind (much simpler to implement!):
(A)
+----------+
| Solidity |
+----+-----+
|
| libsolidity/codegen/ir/
(B) |
+----+------------+
| strict assembly |
| (EVMDialect) |
+----+------------+
|
| Replace u256 identifiers with
| four u64 identifiers.
| (libyul/backends/wasm/WordSizeTransform.h)
(C) |
+----v--------------+
| strict assembly |
| (WasmDialect) |
| All types are u64 |
+----+--------------+
|
| libyul/backends/wasm/EWasmCodeTransform
(D) |
+----v---------------------+
| |
| wasm text representation |
| |
+----+---------------------+
|
| Append wasm implementation of EVM instructions
(E) |
+----v---------------------+
| |
| wasm text representation |
| |
+--------------------------+
Comparing to previous plan, major differences are:
We should generate a function call In #6556 wasm::Expression EWasmCodeTransform::operator()(FunctionalInstruction const&)
. For example, when encounter Instruction::ADD
, we generate a function call to builtin_evm_add
.
Simply append (prepend?) something like
(func $builtin_evm_add
(param $arg_1 i64)
(param $arg_2 i64)
(param $arg_3 i64)
(param $arg_4 i64)
(param $arg_5 i64)
(param $arg_6 i64)
(param $arg_7 i64)
(param $arg_8 i64)
(result i64)
;; actual implementation here
;; should follow calling conventions in EWasmCodeTransform (use set_global in case of multiple return values)
)
Actually I think we are very close to the MVP (?) that compiles "Instruction::ADD only" solidity code to working eWasm.
I think it would be better to do (D) -> (E) still in strict assembly mode (of course with different types and different builtins), because it allows us to still optimize the code afterwards, for example if we can determine that certain variables do not use higher order bits.
How about utilizing binaryen's optimization passes? it have wasm-specific optimization so probably is more effective comparing to general-purpose yul optimizations.
I would still like to use our optimizer if that is an easy thing to do. We can always use binaryen/llvm/whatever afterwards.
Ok! Closing this issue for now, as integer type issues are somewhat solved by #6600
Follow-up of https://github.com/ethereum/solidity/issues/2405#issuecomment-483624120
From my understanding, these are things that we can immediately start working on:
libsolidity/codegen/ir/IRGenerator
generate yul code that conforms to spec. Currently the code it generateFunctionalInstruction
s (e.gadd()
,xor()
) ---- it should only contain functions listed in https://solidity.readthedocs.io/en/v0.5.7/yul.html#low-level-functionslibyul/backends/evm
able to recognize low-level functions listed in doc to keep current--optimize-yul
feature working.