Closed agroce closed 1 year ago
Thank you for your proposal @agroce
I suppose the proposal essentially entails providing a command line option to the fuzzers we currently run on oss fuzz.
I'm not sure I understand the benefit. Is it meant to permit external testers to simply focus on input generation and feed such inputs to a mode such as solc --fuzz-yul-optimizer input.yul
?
There is already the possibility of building the fuzzers we already have and simply wiring it with one's own input generator. This does not depend on the proposed feature, yet permits testing by external people.
For people fuzzing, that's right, it's a little more work and somebody (me when I get some spare time maybe) should make it a little easier to wire in arbitrary code etc., but for fuzzing teams it's not that big a deal.
I think the real benefit here would be that you can compile real contracts in a mode where the compiler turns on every "debugging/fuzzing" check anyone's ever added to it, to minimize chance of wrong code. Even serious contract developers are unlikely to figure out how to run the fuzzing tools on their Yul, etc.
I don't know if this makes sense
(Well, it makes sense, I don't know if it's worth doing, or not)
I think the real benefit here would be that you can compile real contracts in a mode where the compiler turns on every "debugging/fuzzing" check anyone's ever added to it, to minimize chance of wrong code.
Not sure I follow. We are fuzzing the optimiser precisely to minimize chance of wrong code :-)
Even serious contract developers are unlikely to figure out how to run the fuzzing tools on their Yul, etc.
Should they? Those bugs that are introduced by the compiler because the compiler errs in correctly translating high level smart contract logic into EVM byte code may be caught by unit tests written by smart contract developers. They know the program logic best and therefore are in the best position to write tests for them.
Of course it is possible that a unit test does not uncover a compiler bug. But I don't yet understand how the proposed feature might find this class of bugs i.e., compiler bugs uncovered by unit testing.
To find compiler bugs that may be missed by unit tests, one needs to fuzz the optimiser which is precisely what we are doing :-)
Of course one can insert additional sanity checks to catch code generation errors. There are things that we are currently working on e.g., using EVM execution tracing to more precisely compare unoptimized and optimised code. Contributions and PRs are always welcome :-)
I think the probability of a wrong-code bug seriously impacting a serious contract is pretty low, but I think based on our experience with (even very good) unit tests vs. source-level bugs, relying on unit tests to catch anything subtle is... not a great idea (see http://www.ifca.ai/fc20/preproceedings/75.pdf section 4.3). I may be misunderstanding. My thought was that the Yul checks can detect some bugs in optimization for any Yul code, so there's a chance, albeit quite small, it would notice something going wrong with the optimization of code for an arbitrary contract?
Fuzzing the optimizer is great, but it might not run on that specific contract Yul, or something with that bug? Maybe I'm missing something here?
From this and conversation with @bshastry I know there is a Yul optimization mode that, as I understand at, adds extra (possibly expensive?) checks to the validity of optimizations, used in the Yul protobuf fuzzing. I believe this works for arbitrary Yul, so for anything that would hit the optimizer.
@bshastry Where are these checks exactly? This is stuff that's not a part of the core compiler/optimizer and sits in fuzzer's code, right?
I think we'd need to discuss what is actually being checked and how strongly it is tied to fuzzer's implementation details to decide whether it makes sense to include that in the compiler's code and expose that through the CLI.
@bshastry Where are these checks exactly? This is stuff that's not a part of the core compiler/optimizer and sits in fuzzer's code, right?
Right. The checks are here, for example:
At the very least, exposing this as a frontend command line option would mean adding the Yul interpreter as a dependency to solc
. Currently, the Yul interpreter is a tool
and I'm not sure if it is released as such.
I guess this particular check could be reproduced using yulrun
? I.e. getting both optimized and unoptimized Yul output from solc
and then comparing traces from yulrun
?
I guess this particular check could be reproduced using
yulrun
? I.e. getting both optimized and unoptimized Yul output fromsolc
and then comparing traces fromyulrun
?
Yes, yulrun
would do.
Are there any other checks?
Are there any other checks?
For the Yul optimiser, no. The Keccak bug was found by comparing the "trace" produced by evmone but that fuzzer is not upstream yet. In any case, the core of the check is the same except the trace is obtained via a call to the EVMHostPrinter instead of the Yul interpreter
So maybe we could just expose the EVMHostPrinter
via a dedicated executable, similar to yulrun
?
@agroce Would a shell script that runs all of these checks on a given Solidity file be enough? They depend on debug tools like evmone which I think do not provide much value to the average user not interested in testing the compiler. I think that a shell script that works when you have a local build of the compiler would be more lightweight and just as useful.
So maybe we could just expose the
EVMHostPrinter
via a dedicated executable, similar toyulrun
?
Sounds reasonable to me. Sounds like isoltest++ :-)
Not sure how much effort this would entail since the whole testing framework we use is rather complex to package as a black-box fuzzing binary. I suppose the benefit is that smart contract developer checking optimiser correctness although I'm not too convinced about this. Perhaps the real audience for such a binary are security auditors.
I was thinking of something very simple, not involving any fuzzing. Basically just a yes/no check for a single file (i.e. are the optimized and unoptimized yulrun
traces identical?). It would be up to the caller to plug the check into his fuzzing setup and generate interesting input files.
Basically just a yes/no check for a single file
Looked up old triage scripts and found something like this. Please don't ask me why :sweat_smile:
diff $(solc --strict-assembly --optimize $INPUT_YUL 2>&1| awk '/object/,/^}$/' > topt; yulrun --input-file topt > topt.trace; echo "topt.trace") $(solc --strict-assembly $INPUT_YUL 2>&1 | awk '/object/,/^}$/' > t; yulrun --input-file t > t.trace; echo "t.trace")
Aha. I see, the checks depend on the standalone tool. Yeah, rolling that into solc is sort of not fun. For fuzzing purposes or for "final sanity check on contracts" it seems like this script based approach would work, and while not infinitely convenient, could certainly be made pretty trivial for those who want to do it.
Perhaps the real audience for such a binary are security auditors.
Yes, I think at minimum security auditors would be happy to run it, though if it's made obvious a script will work for that.
For fuzzing, a script is actually not quite so great because it'd need to be wrapped up in C/C++ for afl to be happy and instrument things properly, but that doesn't sound too awful.
The other way to use in fuzzing is not to throw this in the inner fuzzing loop, but it looks like @bshastry has a diff that could just run through everything in an AFL corpus and check the corpus for sanity.
Indeed, I'm going to try that out right now :)
(I have a few thousand afl-generated Yul inputs, probably most of which don't compile, but I'm about to see)
(I have a few thousand afl-generated Yul inputs, probably most of which don't compile, but I'm about to see)
Would be curious to know roughly what percentage of afl-generated input is valid :-)
pure afl, probably almost none compile (haven't checked lately), but this variant, it's about 8% of the generated corpus.
(that number is for Yul, just generated as I'm hacking up a Python script to do that diffing with a little more informative output for me, no clue on Solidity without running some experiments)
hmmm
root@f408e72f5e43:~/solidity/test# ../build/test/tools/yulrun --input-file t
terminate called after throwing an instance of 'boost::wrapexcept<solidity::yul::YulAssertion>'
what(): Unknown builtin: linkersymbol
Aborted
root@f408e72f5e43:~/solidity/test# ../build/test/tools/yulrun --input-file topt
Trace:
Memory dump:
Storage dump:
root@f408e72f5e43:~/solidity/test# cat foo.yul
{
let addr := linkersymbol("ract/library.sol8L")
}
// dialect: ====
// evm
// ----
Non-optimized object crashes and the optimized does not, because:
root@f408e72f5e43:~/solidity/test# cat topt
object "object" {
code { { } }
}
but
root@f408e72f5e43:~/solidity/test# cat t
object "object" {
code {
let addr := linkersymbol("ract/library.sol8L")
}
}
So just running on the arbitrary generated Yul might not work.
Non-optimized object crashes and the optimized does not, because:
root@f408e72f5e43:~/solidity/test# cat topt object "object" { code { { } } }
but
root@f408e72f5e43:~/solidity/test# cat t object "object" { code { let addr := linkersymbol("ract/library.sol8L") } }
What's happening here is that the built-in linkersymbol
is not recognised by the yul interpreter but recognised by the Yul parser. The optimised version optimises out addr
because it is unused.
The main question is if we should support linkersymbol
as a valid built-in inside the Yul interpreter but that's a separate issue :-)
BTW, the test case is probably from yul syntax test suite: https://github.com/ethereum/solidity/blob/develop/test/libyul/yulSyntaxTests/linkersymbol_evm.yul
pure afl, probably almost none compile (haven't checked lately), but this variant, it's about 8% of the generated corpus.
Interesting. What is the "variant" here?
Right, I crawl for all the Yul in the test subtree to start AFL running. The solution of the moment is to toss out anything where the foo.yul has 'linkersymbol' in it, for the moment. Doing so, then running.
no clue on Solidity without running some experiments)
Please note that the diff command above is only meant for Yul input. We don't have a standalone tool to interpret EVM byte code yet via Solidity input yet.
yeah, I just meant what portion of the solidity files I hand solfuzzer are actually compiling.
The variant is: https://github.com/agroce/afl-compiler-fuzzer
Trail of Bits actually put out my blog post writeup of all this today: https://blog.trailofbits.com/2021/03/23/a-year-in-the-life-of-a-compiler-fuzzing-campaign/
Ok, we have a maybe more interesting case.
{
switch mload(0) default
{
//use x
let y := 0 mstore(y, 7)
}
let a
//t y := x can re-use a
let b := 0
sstore(a, b)
}
t:
object "object" {
code {
switch mload(0)
default {
let y := 0
mstore(y, 7)
}
let a
let b := 0
sstore(a, b)
}
}
topt:
object "object" {
code {
{
mstore(0, 7)
sstore(0, 0)
}
}
}
Again, unoptimized crashes:
terminate called after throwing an instance of 'boost::wrapexcept<solidity::yul::YulAssertion>'
what(): Parsed successfully but had errors.
Aborted
Again, unoptimized crashes:
terminate called after throwing an instance of 'boost::wrapexcept<solidity::yul::YulAssertion>' what(): Parsed successfully but had errors. Aborted
We are reaching the limits of diff
here :-)
This is expected because a switch
case with only a default statement produces a warning. The optimised version does not bump into this problem because the switch is optimised out because the default path is always taken.
Edit: Producing the warning is not the problem but the Yul interpreter asserting on warning free code is
yeah, I had just started to wonder if that's the issue. trying to think of a clean way to avoid this. maybe just throw out any diffs with "Parsed successfully but had errors" on the non-optimized side? is that likely to cause false negatives?
Hmmm.. How long should yulrun
take to run:
object "object" {
code {
{
for { } 1 { }
{
for { } 1 { }
{
for { } 1 { }
{ }
}
}
}
}
}
The main question is if we should support linkersymbol as a valid built-in inside the Yul interpreter but that's a separate issue :-)
Yeah, we should: https://github.com/ethereum/solidity/pull/10296#issuecomment-728412756 I dropped that PR because I did not have time to do it properly but it is something that we need to handle eventually.
Hmmm.. How long should yulrun take to run:
It's an infinite loop so... infinite time? :) I'm not sure if yulrun
has a timeout.
Yeah, that was my guess. Handled in-script. If both opt and unopt timeout with 10 seconds limit, I call it "equal behavior"
@bshastry oh, that 8% number was wrong; I was running on an early fuzz run where something was configured wrong, running on the actual afl output directory it's closer to 33%
Ok, with the real data and 10 sec timeout, checking the full afl corpus is going to take a while! I'll let you know if anything shows up (bug report if obviously bad, here if likely tuning the comparison check)
First fuzzer instance is done. 3291 generated Yul files. Of these, 1181 (~36%) compiled. No yulrun comparison bugs, other than ones with the "Parsed but..." structure, of which there were 11. But afl isn't too deep into the yul optimizer yet, only 4 days worth of fuzzing, with a few hundred pending inputs in the queue for each of the 3 instances.
Didn't look in after I left office, but before I took off for the day this run did produce #11148 -- though a bit of a surprise, in that without optimization is the issue. Suggests maybe fuzzing some on no-opt (since optimization probably cleans up the Yul a good deal) might be a way to push the assembler through some odd stuff...
Should I remove memoryguard
from consideration as I did with linkersymbol
?
Related issue: #6495
This issue has been marked as stale due to inactivity for the last 90 days. It will be automatically closed in 7 days.
Hi everyone! This issue has been automatically closed due to inactivity. If you think this issue is still relevant in the latest Solidity version and you have something to contribute, feel free to reopen. However, unless the issue is a concrete proposal that can be implemented, we recommend starting a language discussion on the forum instead.
(I have a few thousand afl-generated Yul inputs, probably most of which don't compile, but I'm about to see)
May I ask for the tool (yul-based afl) link? I would like to learn about it
No specific tool, but use https://github.com/agroce/afl-compiler-fuzzer and run part of the solidity toolchain that processes yul (I've forgotten what!) starting from a corpus of valid yul, and you should get something interesting.
Proposal:
Context: reading this -- https://blog.soliditylang.org/2021/02/10/an-introduction-to-soliditys-fuzz-testing-approach/
From this and conversation with @bshastry I know there is a Yul optimization mode that, as I understand at, adds extra (possibly expensive?) checks to the validity of optimizations, used in the Yul protobuf fuzzing. I believe this works for arbitrary Yul, so for anything that would hit the optimizer. I've been thinking about how to improve fuzzing for compilers in general, and one approach would be to just add a --slow-but-checked mode where a compiler may even be too slow for production use most of the time, but runs as many additional sanity checks as anyone is willing to code up. This article (https://cfallin.org/blog/2021/03/15/cranelift-isel-3/) that @rocallahan pointed me to got me thinking along these lines. As Rob has pointed out, it might even be worth compiling Mars code/systems code/high value contracts using such a compiler mode, even if it was very slow, as a defense against wrong code. In this case, it seems to me the check code exists and might not be too hard to integrate into the solc normal front end as an optional added check, maybe to be joined by others.
This is basically "solfuzzer binary on steroids, in main binary" I guess...