ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.28k stars 5.77k forks source link

Replace rule list with "DSL" #7520

Open ekpyron opened 5 years ago

ekpyron commented 5 years ago

TBD

We can probably just use a triplet of Yul expressions, resp. expression patterns:

The match expression can use symbolic pattern variables for matching variables and constants and (at a later stage when extending this from expressions to statements) arbitrary code blocks. Precondition and replacement expressions can use any pattern variables used in the match expression.

christianparpart commented 5 years ago

@ekpyron can you give some example? (or in other words: when can we talk face-to-face about it?) I'm interested in taking this one, but I'd need some more (precise) input of what you have in mind. :)

ekpyron commented 5 years ago

This issue was mainly meant as a reminder to discuss this further - the idea came up with @chriseth at devcon, but I think this might need some more careful and future-proof planning before we actually start with it.

chriseth commented 4 years ago

Examples for potential syntax:

-EVM hasSelfbalance
// need to specify dialect, with restrictions on some rules

balance(address())
selfbalance()

//comment

add(#A, #B)
@+(A, B)

add(X, 0)
X

// rules that remove or reorder placeholders
// only match if the placeholders are movable
// This is a feature of the engine
mul(X, 0)
0

if 1 { T }
{ T }

if 0 { T }
{}

// what are the conditions on T here?
T revert(X, Y)
revert(X, Y)

// can we have "same match" on full statements?
switch X case 0 { T1... T } default { T2... T}
switch X case 0 { T1... } default { T2... } T 
axic commented 3 years ago

There's also #5542 as a relevant issue.

axic commented 3 years ago

first line is match expression, second line is replacement

If we have such a system, I think < and > prefixes would be nice (diff-style), e.g.:

< balance(address())
> selfbalance()

But @ekpyron's proposal in #5542 of defining equalities may be more intuitive:

balance(address()) == selfbalance()
axic commented 3 years ago
  1. In this the core syntax is equality of expressions (from @ekpyron). Where there is no loss of information (replacing instructions with faster instructions, such as shifts), it could reversible too.
  2. A condition can be introduced via iff <expr>. Expressions here must be "constexpr". We borrow the idea from Rust to mark these constexpr versions (macros) with exclamation mark, e.g. sub!, lt!, etc.
  3. Any number of whitespace is allowed between both the rewrite rule and the condition.
  4. Expressions are EVM opcodes (or yul dialect builtins).
  5. Conditions are a different set of expressions/builtins. Special conditionals are introduced for wordsize, bit testing, etc.
  6. The terms A, B, C mean literal on the stack, while X, Y, Z mean non-literals on the stack.
  7. Properties/attributes are introduced via [@<key> <value>], eg. [@evm hasSelfbalance]. It is only valid for the following rewrite rule.
  8. // marks comments.

I played with the syntax a bit and tried to translate a set of random steps from RuleList.h.

//
// Simple rules
//

[@evm hasSelfbalance]
balance(address()) == selfbalance()

exp(0, X) == iszero(X)

exp(1, X) == 1

[@evm hasBitwiseShifting]
exp(2, X) == shl(X, 1)

exp(-1, X) == sub(iszero(and(X, 1)), and(X, 1))

// move constants across subtractions
// X - A -> X + (-A)
sub(X, A) == add(X, sub!(0, A))

//
// Rules with conditions
//

//        rules.push_back({
//                Builtins::BYTE(A, Builtins::SHR(B, X)),
//                [=]() -> Pattern { return Word(0); },
//                [=] { return A.d() < B.d() / 8; }
//        });
byte(A, shr(B, X)) == 0 iff lt!(A, div!(B, 8))

//        // Replace MOD X, <power-of-two> with AND X, <power-of-two> - 1
//        for (size_t i = 0; i < Pattern::WordSize; ++i)
//        {
//                Word value = Word(1) << i;
//                rules.push_back({
//                        Builtins::MOD(X, value),
//                        [=]() -> Pattern { return Builtins::AND(X, value - 1); }
//                });
//        }
//
// In these cases it may make more sense to just expand it manually.
mod(X, 1) == and(X, 0)
mod(X, 2) == and(X, 1)
mod(X, 4) == and(X, 3)
// But could also consider adding a pseudo-variable.
mod(X, shl!(1, K)) == and(X, sub!(shl!(1, K), 1))) iff and!(gte!(K, 0), lte!(K, wordsize!()))

// Also extra whitespaces can be added for readability.
mod(X, shl!(1, K)) == and(X, sub!(shl!(1, K), 1)))
   iff
and!(
  gte!(K, 0),
  lte!(K, wordsize!())
)

//                {Builtins::SHL(A, B), [=]{
//                        if (A.d() >= Pattern::WordSize)
//                                return Word(0);
//                        return shlWorkaround(B.d(), unsigned(A.d()));
//                }},
shl(A, B) == 0 iff gte!(A, wordsize!())
shl(A, B) == shl!(A, B)

// {Builtins::LT(A, B), [=]() -> Word { return A.d() < B.d() ? 1 : 0; }},
lt(A, B) == 0 iff gte!(A, B)
lt(A, B) == 1 iff lt!(A, B)
axic commented 3 years ago

If we want to not have magic variable names, then as mentioned we need a way to mark literals. I think the # prefix on the match is an okay way do it, but probably makes sense to keep that prefix on both sides.

axic commented 3 years ago

Here are some more, line-by-line translations:

 add(#A, #B) == add!(A, B)
 mul(#A, #B) == mul!(A, B)
 byte(#A, #B) == 0 iff gte!(A, div!(wordsize!(), 8))
 byte(#A, #B) == shr!(B, and!(mul!(8, sub!(sub!(div!(wordsize!(), 8), 1), A)), 0xff)) iff lt!(A, div!(wordsize!(), 8))
 shl(#A, #B) == 0 iff gte!(A, wordsize!())
 shl(#A, #B) == shl!(A, B)
 shr(#A, #B) == 0 iff gte!(A, wordsize!())
 shr(#A, #B) == shr!(A, B)
 sub(not!(0), X) == not!(0)
 mul(X, not!(0)) == sub(0, X)
 mul(not!(0), X) == sub(0, X)
 and(X, not!(0)) == X
 and(not!(0), X) == X
 eq(X, 0) == iszero(X)
 eq(0, X) == iszero(X)
 gt(X, 0) == iszero(iszero(X))
 lt(0, X) == iszero(iszero(X))
 gt(X, not!(0)) == 0
 lt(not!(0), X) == 0
 and(byte(X, Y), 0xff) == byte(X, Y)
 byte(sub!(div!(wordsize!(), 8), 1), X) == and(X, 0xff)
 not(not(X) == X
 xor(X, xor(X, Y)) == Y
 or(not(X), X) == not!(0)
 and(and(X, Y), Y) == and(X, Y)
 mod(X, 1) == and(X, 0)
 mod(X, 2) == and(X, 1)
 mod(X, 4) == and(X, 3)
 mod(X, shl!(1, K)) == and(X, sub!(shl!(1, K), 1))) iff and!(gte!(K, 0), lte!(K, wordsize!()))
 byte(#A, shr(#B, X)) == 0 iff lt!(A, div!(B, 8))
 and(address(), sub!(shl!(1, 160), 1)) == address()
 and(caller(), sub!(shl!(1, 160), 1)) == caller()
 and(origin(), sub!(shl!(1, 160), 1)) == origin()
 and(coinbase(), sub!(shl!(1, 160), 1)) == coinbase()
 iszero(iszero(eq(X, Y))) == eq(X, Y)
 iszero(iszero(lt(X, Y))) == lt(X, Y)
 iszero(iszero(slt(X, Y))) == slt(X, Y)
 iszero(iszero(gt(X, Y))) == gt(X, Y)
 iszero(iszero(sgt(X, Y))) == sgt(X, Y)
 iszero(iszero(iszero(X))) == iszero(X)
 iszero(xor(X, Y)) == eq(X, Y)
 byte(#A, shr(#B, X)) == 0 iff lt!(A, div!(B, 8))
 sub(X, #A) == add(X, sub!(0, A))
 sub(add(X, #A), Y) == add(sub(X, Y), A)
 [@evm hasSelfbalance]
 balance(address()) == selfbalance()
 exp(0, X) == iszero(X)
 exp(1, X) == 1
 [@evm hasBitwiseShifting]
 exp(2, X) == shl(X, 1)
 exp(-1, X) == sub(iszero(and(X, 1)), and(X, 1))
chriseth commented 3 years ago

This looks very good! Some remarks:

Finally: Why not have

 exp(2, X) == shl(X, 1) if  evm!(hasBitwiseShifting)

instead of

 [@evm hasBitwiseShifting]
 exp(2, X) == shl(X, 1)
axic commented 3 years ago

I'm not sure about the == if the two sides are not really equal (you can use lt!(#A, #B) on the RHS but not on the LHS).

You can use the evaluated terms on either side, but perhaps it is better to go with the direction indicator -> and not allowing implicit reversibility.

maybe we should replace wordsize by bitsPerWord and bytesPerWord

We could also decide to make the rulelist dialect specific and then this is not needed.

Why not have evm!(hasBitwiseShifting)

That also sounds like an option, but would make it look like the datasize builtin and use a literal and not identifier: evm!("hasBitwiseShifting")

maybe use !lt instead of lt!?

Not sure, i personally like the ! suffix better.

ekpyron commented 3 years ago

I just saw this discussion and would bring up the following:

So the current simplifications can be replaced by the following steps:

But yeah, just in case you want to consider this :-). It's not really an issue to first translate the rules into the verbose format, because we can still change it by simply removing any ! and # at any later point. I'd suggest to group "evaluation rules" and proper simplification rules together, though. Also my suggestion is tailored to the new optimizer and only the new optimizer. If we want to use the DSL for the old optimizer as well, we'll probably need these annotations, because we probably wouldn't want to rewrite any of that this much (for the new optimizer on the other hand I would argue that it would run more smoothly and be significantly simpler with the more general rules and split out "pure builtin function evaluation").

chriseth commented 3 years ago

Your proposal sounds very good to me, @ekpyron!

I'm wondering how the "redundant masking" rule would look like in his framework.

and(x, y) == x if eq(and(x, y), x)

This looks trivial, but we have to given the reasoning engine something to work with.

ekpyron commented 3 years ago

That's tricky... and(x, y) == x if eq(and(x, y), x) would "just work" in the framework I described without any special care whenever x and y are constants, resp. have known values, but then it is trivial and the rule wouldn't be needed anyways, because and(x,y) could just directly be evaluated.

For non-constants, we could try to optimize the reasoning engine for special cases like y is a constant power-of-two-minus-one and x has an upper bound - that'd be possible. Actually one could even formulate a rule specifically for that case that is easy to evaluate without special support in the reasoning engine - and(iszero(iszero(y)), iszero(and(y, sub(y, 1)))) should be "y is a power of two", so and(x, y) == x if and(and(iszero(iszero(add(y,1))), iszero(and(add(y, 1),y))), iszero(gt(x,y))) is something that should work in this special case and should be evaluatable for constant y using simple inequality reasoning about x and y only... but I wouldn't trust that I got that right without SMT-veryfing that rule :-) - and then the question is, where does it end and the more cases we want to cover the more this task would become "implement an SMT solver in the rule list or its reasoning engine", which is way beyond the scope of the original idea of this issue...

But this shows how complex reasoning engines can help applying rules in this framework - if we were to simultanously SMT-encode the Yul AST while applying the rules, then we could query an SMT solver to show that iszero(<condition>) is UNSAT (probably with a very small timeout, since otherwise it'd slow things down too much) - and if it is, apply the rule. Actually this could even replace "pure constant function evaluation" (at least when applying the rules), if we supplied SMT semantics for all "pure" builtins - I'd expect SMT solvers to just instantly evaluate those on their own then and eq(constant1, constant2) would be instantly SAT or UNSAT as well. But we're back to the trust issues against SMT solvers plus it'd still probably degrade performance...

But yeah, my idea was not to do anything like that or any reasoning whatsoever in a first step, but just have plain dumb evaluation of constants for the conditions - and in that case a rule for redundant masking wouldn't be possible as far as I can see.