ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.37k stars 5.78k forks source link

solc mode that runs the Yul optimization checks, other expensive compiler checks #11132

Closed agroce closed 1 year ago

agroce commented 3 years ago

Proposal:

Context: reading this -- https://blog.soliditylang.org/2021/02/10/an-introduction-to-soliditys-fuzz-testing-approach/

From this and conversation with @bshastry I know there is a Yul optimization mode that, as I understand at, adds extra (possibly expensive?) checks to the validity of optimizations, used in the Yul protobuf fuzzing. I believe this works for arbitrary Yul, so for anything that would hit the optimizer. I've been thinking about how to improve fuzzing for compilers in general, and one approach would be to just add a --slow-but-checked mode where a compiler may even be too slow for production use most of the time, but runs as many additional sanity checks as anyone is willing to code up. This article (https://cfallin.org/blog/2021/03/15/cranelift-isel-3/) that @rocallahan pointed me to got me thinking along these lines. As Rob has pointed out, it might even be worth compiling Mars code/systems code/high value contracts using such a compiler mode, even if it was very slow, as a defense against wrong code. In this case, it seems to me the check code exists and might not be too hard to integrate into the solc normal front end as an optional added check, maybe to be joined by others.

This is basically "solfuzzer binary on steroids, in main binary" I guess...

bshastry commented 3 years ago

Thank you for your proposal @agroce

I suppose the proposal essentially entails providing a command line option to the fuzzers we currently run on oss fuzz.

I'm not sure I understand the benefit. Is it meant to permit external testers to simply focus on input generation and feed such inputs to a mode such as solc --fuzz-yul-optimizer input.yul?

There is already the possibility of building the fuzzers we already have and simply wiring it with one's own input generator. This does not depend on the proposed feature, yet permits testing by external people.

agroce commented 3 years ago

For people fuzzing, that's right, it's a little more work and somebody (me when I get some spare time maybe) should make it a little easier to wire in arbitrary code etc., but for fuzzing teams it's not that big a deal.

I think the real benefit here would be that you can compile real contracts in a mode where the compiler turns on every "debugging/fuzzing" check anyone's ever added to it, to minimize chance of wrong code. Even serious contract developers are unlikely to figure out how to run the fuzzing tools on their Yul, etc.

agroce commented 3 years ago

I don't know if this makes sense

agroce commented 3 years ago

(Well, it makes sense, I don't know if it's worth doing, or not)

bshastry commented 3 years ago

I think the real benefit here would be that you can compile real contracts in a mode where the compiler turns on every "debugging/fuzzing" check anyone's ever added to it, to minimize chance of wrong code.

Not sure I follow. We are fuzzing the optimiser precisely to minimize chance of wrong code :-)

Even serious contract developers are unlikely to figure out how to run the fuzzing tools on their Yul, etc.

Should they? Those bugs that are introduced by the compiler because the compiler errs in correctly translating high level smart contract logic into EVM byte code may be caught by unit tests written by smart contract developers. They know the program logic best and therefore are in the best position to write tests for them.

Of course it is possible that a unit test does not uncover a compiler bug. But I don't yet understand how the proposed feature might find this class of bugs i.e., compiler bugs uncovered by unit testing.

To find compiler bugs that may be missed by unit tests, one needs to fuzz the optimiser which is precisely what we are doing :-)

Of course one can insert additional sanity checks to catch code generation errors. There are things that we are currently working on e.g., using EVM execution tracing to more precisely compare unoptimized and optimised code. Contributions and PRs are always welcome :-)

agroce commented 3 years ago

I think the probability of a wrong-code bug seriously impacting a serious contract is pretty low, but I think based on our experience with (even very good) unit tests vs. source-level bugs, relying on unit tests to catch anything subtle is... not a great idea (see http://www.ifca.ai/fc20/preproceedings/75.pdf section 4.3). I may be misunderstanding. My thought was that the Yul checks can detect some bugs in optimization for any Yul code, so there's a chance, albeit quite small, it would notice something going wrong with the optimization of code for an arbitrary contract?

Fuzzing the optimizer is great, but it might not run on that specific contract Yul, or something with that bug? Maybe I'm missing something here?

cameel commented 3 years ago

From this and conversation with @bshastry I know there is a Yul optimization mode that, as I understand at, adds extra (possibly expensive?) checks to the validity of optimizations, used in the Yul protobuf fuzzing. I believe this works for arbitrary Yul, so for anything that would hit the optimizer.

@bshastry Where are these checks exactly? This is stuff that's not a part of the core compiler/optimizer and sits in fuzzer's code, right?

I think we'd need to discuss what is actually being checked and how strongly it is tied to fuzzer's implementation details to decide whether it makes sense to include that in the compiler's code and expose that through the CLI.

bshastry commented 3 years ago

@bshastry Where are these checks exactly? This is stuff that's not a part of the core compiler/optimizer and sits in fuzzer's code, right?

Right. The checks are here, for example:

https://github.com/ethereum/solidity/blob/8d00100c4e9d211830a5dbf6a1f964d98648480d/test/tools/ossfuzz/strictasm_diff_ossfuzz.cpp#L100-L101

At the very least, exposing this as a frontend command line option would mean adding the Yul interpreter as a dependency to solc. Currently, the Yul interpreter is a tool and I'm not sure if it is released as such.

cameel commented 3 years ago

I guess this particular check could be reproduced using yulrun? I.e. getting both optimized and unoptimized Yul output from solc and then comparing traces from yulrun?

bshastry commented 3 years ago

I guess this particular check could be reproduced using yulrun? I.e. getting both optimized and unoptimized Yul output from solc and then comparing traces from yulrun?

Yes, yulrun would do.

cameel commented 3 years ago

Are there any other checks?

bshastry commented 3 years ago

Are there any other checks?

For the Yul optimiser, no. The Keccak bug was found by comparing the "trace" produced by evmone but that fuzzer is not upstream yet. In any case, the core of the check is the same except the trace is obtained via a call to the EVMHostPrinter instead of the Yul interpreter

See https://github.com/ethereum/solidity/blob/8d00100c4e9d211830a5dbf6a1f964d98648480d/test/EVMHost.cpp#L778

cameel commented 3 years ago

So maybe we could just expose the EVMHostPrinter via a dedicated executable, similar to yulrun?

@agroce Would a shell script that runs all of these checks on a given Solidity file be enough? They depend on debug tools like evmone which I think do not provide much value to the average user not interested in testing the compiler. I think that a shell script that works when you have a local build of the compiler would be more lightweight and just as useful.

bshastry commented 3 years ago

So maybe we could just expose the EVMHostPrinter via a dedicated executable, similar to yulrun?

Sounds reasonable to me. Sounds like isoltest++ :-)

Not sure how much effort this would entail since the whole testing framework we use is rather complex to package as a black-box fuzzing binary. I suppose the benefit is that smart contract developer checking optimiser correctness although I'm not too convinced about this. Perhaps the real audience for such a binary are security auditors.

cameel commented 3 years ago

I was thinking of something very simple, not involving any fuzzing. Basically just a yes/no check for a single file (i.e. are the optimized and unoptimized yulrun traces identical?). It would be up to the caller to plug the check into his fuzzing setup and generate interesting input files.

bshastry commented 3 years ago

Basically just a yes/no check for a single file

Looked up old triage scripts and found something like this. Please don't ask me why :sweat_smile:

diff $(solc --strict-assembly --optimize $INPUT_YUL 2>&1| awk '/object/,/^}$/' > topt; yulrun --input-file topt > topt.trace; echo "topt.trace") $(solc --strict-assembly $INPUT_YUL 2>&1 | awk '/object/,/^}$/' > t; yulrun --input-file t > t.trace; echo "t.trace")
agroce commented 3 years ago

Aha. I see, the checks depend on the standalone tool. Yeah, rolling that into solc is sort of not fun. For fuzzing purposes or for "final sanity check on contracts" it seems like this script based approach would work, and while not infinitely convenient, could certainly be made pretty trivial for those who want to do it.

agroce commented 3 years ago

Perhaps the real audience for such a binary are security auditors.

Yes, I think at minimum security auditors would be happy to run it, though if it's made obvious a script will work for that.

For fuzzing, a script is actually not quite so great because it'd need to be wrapped up in C/C++ for afl to be happy and instrument things properly, but that doesn't sound too awful.

agroce commented 3 years ago

The other way to use in fuzzing is not to throw this in the inner fuzzing loop, but it looks like @bshastry has a diff that could just run through everything in an AFL corpus and check the corpus for sanity.

agroce commented 3 years ago

Indeed, I'm going to try that out right now :)

agroce commented 3 years ago

(I have a few thousand afl-generated Yul inputs, probably most of which don't compile, but I'm about to see)

bshastry commented 3 years ago

(I have a few thousand afl-generated Yul inputs, probably most of which don't compile, but I'm about to see)

Would be curious to know roughly what percentage of afl-generated input is valid :-)

agroce commented 3 years ago

pure afl, probably almost none compile (haven't checked lately), but this variant, it's about 8% of the generated corpus.

agroce commented 3 years ago

(that number is for Yul, just generated as I'm hacking up a Python script to do that diffing with a little more informative output for me, no clue on Solidity without running some experiments)

agroce commented 3 years ago

hmmm

root@f408e72f5e43:~/solidity/test# ../build/test/tools/yulrun --input-file t
terminate called after throwing an instance of 'boost::wrapexcept<solidity::yul::YulAssertion>'
  what():  Unknown builtin: linkersymbol
Aborted
root@f408e72f5e43:~/solidity/test# ../build/test/tools/yulrun --input-file topt
Trace:
Memory dump:
Storage dump:
root@f408e72f5e43:~/solidity/test# cat foo.yul 
{
    let addr := linkersymbol("ract/library.sol8L")
}
// dialect: ====
// evm
// ----
agroce commented 3 years ago

Non-optimized object crashes and the optimized does not, because:

root@f408e72f5e43:~/solidity/test# cat topt
object "object" {
    code { { } }
}

but

root@f408e72f5e43:~/solidity/test# cat t
object "object" {
    code {
        let addr := linkersymbol("ract/library.sol8L")
    }
}
agroce commented 3 years ago

So just running on the arbitrary generated Yul might not work.

bshastry commented 3 years ago

Non-optimized object crashes and the optimized does not, because:

root@f408e72f5e43:~/solidity/test# cat topt
object "object" {
    code { { } }
}

but

root@f408e72f5e43:~/solidity/test# cat t
object "object" {
    code {
        let addr := linkersymbol("ract/library.sol8L")
    }
}

What's happening here is that the built-in linkersymbol is not recognised by the yul interpreter but recognised by the Yul parser. The optimised version optimises out addr because it is unused.

The main question is if we should support linkersymbol as a valid built-in inside the Yul interpreter but that's a separate issue :-)

BTW, the test case is probably from yul syntax test suite: https://github.com/ethereum/solidity/blob/develop/test/libyul/yulSyntaxTests/linkersymbol_evm.yul

bshastry commented 3 years ago

pure afl, probably almost none compile (haven't checked lately), but this variant, it's about 8% of the generated corpus.

Interesting. What is the "variant" here?

agroce commented 3 years ago

Right, I crawl for all the Yul in the test subtree to start AFL running. The solution of the moment is to toss out anything where the foo.yul has 'linkersymbol' in it, for the moment. Doing so, then running.

bshastry commented 3 years ago

no clue on Solidity without running some experiments)

Please note that the diff command above is only meant for Yul input. We don't have a standalone tool to interpret EVM byte code yet via Solidity input yet.

agroce commented 3 years ago

yeah, I just meant what portion of the solidity files I hand solfuzzer are actually compiling.

The variant is: https://github.com/agroce/afl-compiler-fuzzer

Trail of Bits actually put out my blog post writeup of all this today: https://blog.trailofbits.com/2021/03/23/a-year-in-the-life-of-a-compiler-fuzzing-campaign/

agroce commented 3 years ago

Ok, we have a maybe more interesting case.

{
    switch mload(0) default 
{
    //use x
    let y := 0 mstore(y, 7)
    }
let a
    //t y := x can re-use a
    let b := 0
    sstore(a, b)
}

t:

object "object" {
    code {
        switch mload(0)
        default {
            let y := 0
            mstore(y, 7)
        }
        let a
        let b := 0
        sstore(a, b)
    }
}

topt:

object "object" {
    code {
        {
            mstore(0, 7)
            sstore(0, 0)
        }
    }
}
agroce commented 3 years ago

Again, unoptimized crashes:

terminate called after throwing an instance of 'boost::wrapexcept<solidity::yul::YulAssertion>'
  what():  Parsed successfully but had errors.
Aborted
bshastry commented 3 years ago

Again, unoptimized crashes:

terminate called after throwing an instance of 'boost::wrapexcept<solidity::yul::YulAssertion>'
  what():  Parsed successfully but had errors.
Aborted

We are reaching the limits of diff here :-)

This is expected because a switch case with only a default statement produces a warning. The optimised version does not bump into this problem because the switch is optimised out because the default path is always taken.

Edit: Producing the warning is not the problem but the Yul interpreter asserting on warning free code is

See https://github.com/ethereum/solidity/blob/e3ea5c631e9c3614ed6b2bff4806a670f769c682/test/tools/yulrun.cpp#L73

agroce commented 3 years ago

yeah, I had just started to wonder if that's the issue. trying to think of a clean way to avoid this. maybe just throw out any diffs with "Parsed successfully but had errors" on the non-optimized side? is that likely to cause false negatives?

agroce commented 3 years ago

Hmmm.. How long should yulrun take to run:

object "object" {
    code {
        {
            for { } 1 { }
            {
                for { } 1 { }
                {
                    for { } 1 { }
                    { }
                }
            }
        }
    }
}
cameel commented 3 years ago

The main question is if we should support linkersymbol as a valid built-in inside the Yul interpreter but that's a separate issue :-)

Yeah, we should: https://github.com/ethereum/solidity/pull/10296#issuecomment-728412756 I dropped that PR because I did not have time to do it properly but it is something that we need to handle eventually.

Hmmm.. How long should yulrun take to run:

It's an infinite loop so... infinite time? :) I'm not sure if yulrun has a timeout.

agroce commented 3 years ago

Yeah, that was my guess. Handled in-script. If both opt and unopt timeout with 10 seconds limit, I call it "equal behavior"

agroce commented 3 years ago

@bshastry oh, that 8% number was wrong; I was running on an early fuzz run where something was configured wrong, running on the actual afl output directory it's closer to 33%

agroce commented 3 years ago

Ok, with the real data and 10 sec timeout, checking the full afl corpus is going to take a while! I'll let you know if anything shows up (bug report if obviously bad, here if likely tuning the comparison check)

agroce commented 3 years ago

First fuzzer instance is done. 3291 generated Yul files. Of these, 1181 (~36%) compiled. No yulrun comparison bugs, other than ones with the "Parsed but..." structure, of which there were 11. But afl isn't too deep into the yul optimizer yet, only 4 days worth of fuzzing, with a few hundred pending inputs in the queue for each of the 3 instances.

agroce commented 3 years ago

Didn't look in after I left office, but before I took off for the day this run did produce #11148 -- though a bit of a surprise, in that without optimization is the issue. Suggests maybe fuzzing some on no-opt (since optimization probably cleans up the Yul a good deal) might be a way to push the assembler through some odd stuff...

agroce commented 3 years ago

Should I remove memoryguard from consideration as I did with linkersymbol?

cameel commented 3 years ago

Related issue: #6495

github-actions[bot] commented 1 year ago

This issue has been marked as stale due to inactivity for the last 90 days. It will be automatically closed in 7 days.

github-actions[bot] commented 1 year ago

Hi everyone! This issue has been automatically closed due to inactivity. If you think this issue is still relevant in the latest Solidity version and you have something to contribute, feel free to reopen. However, unless the issue is a concrete proposal that can be implemented, we recommend starting a language discussion on the forum instead.

Subway2023 commented 10 months ago

(I have a few thousand afl-generated Yul inputs, probably most of which don't compile, but I'm about to see)

May I ask for the tool (yul-based afl) link? I would like to learn about it

agroce commented 10 months ago

No specific tool, but use https://github.com/agroce/afl-compiler-fuzzer and run part of the solidity toolchain that processes yul (I've forgotten what!) starting from a corpus of valid yul, and you should get something interesting.