SeUniVr / EtherSolve

Source code of EtherSolve: static analysis of Ethereum bytecode
MIT License
65 stars 15 forks source link

IndexOutOfBoundException with a contract #4

Open tanmaster opened 2 years ago

tanmaster commented 2 years ago

First of all: great tool, thanks for your efforts!

I have tried to execute EtherSolve on the 1k given contract bytecodes. I found three errors, but I assume you are aware of them since you put the contracts into the dataset yourself. For the first two errors I could figure out the reason:

For the third error, I tried stepping around in the debugger and I found that it fails at the 3rd line in the following function:

private void executeSwap(SwapOpcode opcode) {
    int i = stack.size() - opcode.getValue() - 1;
    int j = stack.size() - 1;
    BigInteger tmp = stack.get(i);  // Exception is thrown here
    stack.set(i, stack.get(j));
    stack.set(j, tmp);
}

The reason is that the stack has too little content to execute a SWAP4 opcode (there are 4 elements, but 5 are needed). My question is: Since EtherSolve still generates a json output file with an okay looking CFG, is it save to use? Or could the json contain wrong/missing data? What does the error mean? Could it maybe a bug in the compiler that compiled the contract?

Maybe I should mention that I am not wondering about this contract in particular, but about errors in EtherSolve in general and how to handle them if I encounter any since I plan on using it on a large amount of contracts :-)

Also, just in case, here's the script I used:

FILES="Benchmark/Bytecode-dataset-1000-contracts/*"
for f in $FILES
do
  a="$(basename -- "$f")"
  java -jar EtherSolve.jar -jr "Benchmark/Bytecode-dataset-1000-contracts/$a" -o "out/$a.json"
done

Thanks again for your work, very cool!

VersusF commented 2 years ago

Hello and thanks for your issue, especially for your detailed analysis. We are aware of these three fails, but they are part of the randomly selected dataset. Our hypothesis is that, during the CFG walk, the symbolic execution travels paths that should not be reachable on a real execution (due to conditional statements); probably the SWAP4 is inside a loop which pops element from the stack.

Of course, this may happen on other contracts too. Nevertheless, EtherSolve builds its CFG incrementally, so the output is the CFG obtained at that point of the symbolic execution. It is more likely to be a partial, or incomplete CFG rather than a wrong one.

A reasonable idea could be to catch those errors, mark the path as "anomalous", prune it from the queue and continue with the analysis. This may also produce more complete CFGs.

Does this sound good to you? Or do you think it's a hacky solution?

tanmaster commented 2 years ago

Thank you for your explanation!

A reasonable idea could be to catch those errors, mark the path as "anomalous", prune it from the queue and continue with the analysis. This may also produce more complete CFGs.

Does this sound good to you? Or do you think it's a hacky solution?

That sounds like a very reasonable solution to me too :+1:

I would love to help with implementing, but I think I'm lacking the knowledge and oversight within your project to do so effectively. I'll gladly help testing it with a bunch of contracts whenever you decide to implement it though!