Consensys / mythril

Security analysis tool for EVM bytecode. Supports smart contracts built for Ethereum, Hedera, Quorum, Vechain, Rootstock, Tron and other EVM-compatible blockchains.
https://mythx.io/
MIT License
3.88k stars 741 forks source link

How to make Mythril analyze all functions to their regular end at RETURN/STOP? #1777

Open gsalzer opened 1 year ago

gsalzer commented 1 year ago

I try to understand how Mythril/Laser traverses the execution traces, with the aim to let it reach the regular end of as many functions as possible. For testing, I added a module Function in a file mythril/analysis/module/modules/function.py, see below for the code. Its purpose is to report the name of the current function when reaching its regular end (RETURN). I run it as

myth analyze -f ShieldController.hex -m Function

where ShieldController.hex is the contract creation code at https://etherscan.io/address/0x860b3913e248e6ba352120d550567379cb48fdd6#code.

The expected output are the functions listed by solc --optimize --hashes ShieldController.sol for contract ShieldController. However, the call above just reports the constructor and a function transferFrom(address,address,uint256), which does not occur in ShieldController, but seems to be a function in a contract deployed by ShieldController.

Which settings are needed for Mythril/Laser (like max-depth, loop-bound, call-depth-limit, transaction-count and the search strategy) such that it analyzes the main branches of all (or most) functions until their final RETURN? In the intended application, reaching RETURN will trigger checks for that function.


Contents of mythril/analysis/module/modules/function.py:

from mythril.analysis.report import Issue
from mythril.analysis.module.base import DetectionModule

class Function(DetectionModule):
    name = "Show functions visited"
    swc_id = "0"
    description = "Show functions visited"
    pre_hooks = ["RETURN"]

    def _execute(self, state):
        issue = Issue(
            contract=state.environment.active_account.contract_name,
            function_name=state.environment.active_function_name,
            bytecode=state.environment.code.bytecode,
            title="function visited",
            address=0,
            swc_id=Function.swc_id,
            severity="Low",
            description_head="function visited",
            description_tail="",
            gas_used=(state.mstate.min_gas_used, state.mstate.max_gas_used),
        )
        return [issue]

detector = Function()

In mythril/analysis/module/loader.py, I added


from mythril.analysis.module.modules.function import Function
...
    def _register_mythril_modules(self):
        self._modules.extend([ ..., Function()])
gsalzer commented 1 year ago

Here are the functions that I want Mythril to report:

$ solc --optimize --hashes ShieldController.sol
...
======= ShieldController.sol:ShieldController =======
Function signatures:
38af3eed: beneficiary()
75b4d78c: bonus()
dc070657: changeBeneficiary(address)
c1ff808d: changeBonus(uint256)
1e761681: changeDepositAmt(uint256)
decc1a86: changeRefFee(uint256)
5b1dbd80: claimTokens(address,address,address)
baef5dfd: createShield(string,string,address,address,address,address,uint256[],address[])
10adc3f5: deleteShield(address,uint256)
b5677b42: depositAmt()
c25dd27a: emitAction(address,address,address,address,uint256,uint256,bool)
82e2fadb: getBalances(address,uint256,uint256)
08abe957: getShields()
aa6ca808: getTokens()
0c340a24: governor()
80d85911: initialize(uint256,uint256,uint256)
fc6b3204: isGov()
1c74a301: receiveOwnership()
ec2e0ab3: refFee()
2fce9884: shieldMapping(address)
f2fde38b: transferOwnership(address)
...
norhh commented 1 year ago

Mythril visits all functions. You can prefer using bfs as a strategy to make it prioritise the breadth over depth. If your goal is to simply visit all functions, you can just stick to --transaction-count 1. You can ignore the rest, as they are made to be large enough for most contracts. loop-bound and call-depth-limit might only be needed to be increased in exceptional circumstances.

gsalzer commented 1 year ago

@norhh Thanks for your suggestion. However, the option strategy doesn't make a difference. After maybe a minute Mythril stops, reporting just the constructor and transferFrom, but not a single of the 21 functions that really are in the code.

./myth analyze -f ShieldController.hex -m Function --strategy bfs
==== function visited ====
SWC ID: 0
Severity: Low
Contract: MAIN
Function name: constructor
PC address: 0
Estimated Gas Usage: 1521 - 3920
function visited

--------------------

==== function visited ====
SWC ID: 0
Severity: Low
Contract: MAIN
Function name: gasprice_bit_ether(int128) or transferFrom(address,address,uint256)
PC address: 0
Estimated Gas Usage: 2742 - 4155
function visited

--------------------

--transaction-count doesn't change anything.

gsalzer commented 1 year ago
./myth version
Mythril version v0.23.23
norhh commented 1 year ago

You should add STOP into your pre_hooks, as most functions end with either STOP or REVERT. The regular end for most functions is STOP rather than RETURN

gsalzer commented 1 year ago

@norhh Thanks for the hint regarding STOP, for functions not returning a value. Adding STOP to the pre-hooks lets Mythril find seven more functions. But the remaining 14 functions have a return value, and these functions do terminate with RETURN, like simple getters as bonus().

Adding REVERT yields most other functions, but still not bonus(). But this doesn't help, if I want to perform a check at the regular end of a function.

Might it be the case that Mythril stops its analysis when encountering a state similar to ones seen before? Actually, the state at the end of a getter is not really the same as at the end of other getters, as they return different storage values. Is there a way to let Mythril also explore such parts of the search space?

norhh commented 1 year ago
Screenshot 2023-06-19 at 17 18 33

Simple getters such as bonus() do not have a RETURN or STOP. They just put the corresponding return value at the top of the stack. A RETURN is much more expensive and heavyweight compared to this operation. From the way mythril computes the function name, the RETURN will happen outside the zone of what Mythril considers the code as the getter function. The state.environment.active_function_name will be the library part of the code which it seems to consider as fallback.

norhh commented 1 year ago

A quick fix for fixing the getter issue will be to solve for the first 4 bytes of calldata to get the function signature. I'll push that fix today sometime.

gsalzer commented 1 year ago

Simple getters such as bonus() do not have a RETURN or STOP.

Of course they do, how would the EVM otherwise return control to the calling context, with a return value? Below is the sequence of instructions for bonus(); it ends with RETURN.

000eb: PUSH4 0x75b4d78c ; bonus()
000f0: EQ
000f1: PUSH3 0x0002a8
000f5: JUMPI

002a8: JUMPDEST ; bonus()
002a9: CALLVALUE
002aa: DUP1
002ab: ISZERO
002ac: PUSH3 0x0002b5
002b0: JUMPI

002b1: PUSH1 0x00
002b3: DUP1
002b4: REVERT

002b5: JUMPDEST ; bonus() non-payable
002b6: POP
002b7: PUSH3 0x0002c0
002bb: PUSH1 0x34
002bd: SLOAD    ; S[0x34] 0x0002c0
002be: DUP2     ; 0x0002c0 S[0x34] 0x0002c0
002bf: JUMP

002c0: JUMPDEST ; S[0x34] 0x0002c0
002c1: PUSH1 0x40
002c3: MLOAD    ; fmp S[0x34] 0x0002c0
002c4: SWAP1
002c5: DUP2     ; fmp S[0x34] fmp 0x0002c0
002c6: MSTORE   ; fmp 0x0002c0 # M[fmp] = S[0x34]
002c7: PUSH1 0x20
002c9: ADD      ; fmp+0x20 0x0002c0 # M[fmp] = S[0x34]
002ca: PUSH3 0x00017b
002ce: JUMP

0017b: JUMPDEST
0017c: PUSH1 0x40 
0017e: MLOAD    ; fmp fmp+0x20 0x0002c0 # M[fmp] = S[0x34]
0017f: DUP1     ; fmp fmp fmp+0x20 0x0002c0 # M[fmp] = S[0x34]
00180: SWAP2    ; fmp+0x20 fmp fmp 0x0002c0 # M[fmp] = S[0x34]
00181: SUB      ; 0x20 fmp 0x0002c0 # M[fmp] = S[0x34]  
00182: SWAP1    ; fmp 0x20 0x0002c0 # M[fmp] = S[0x34]
00183: RETURN
norhh commented 1 year ago

The RETURN for such functions happen outside the zone of what Mythril considers as the function bonus(). Such cases are mapped as unknown and Mythril outputs them as fallback.

norhh commented 1 year ago

The bytecode maps the final RETURN which is generic to contract ShieldController is Governable part of the code which other functions can JUMP to and reuse. That's a reason why, sometimes, source code mapping might look like it's broken when bugs map to such locations. So the easiest and an accurate solution is solving for the calldata when such information is required.

gsalzer commented 1 year ago

Thanks for the explanations, I think I now understand. It would be nice if Mythril provided reliable information on such basic information as the function name. To me, this seems essential. May I add this to the wish list for Mythril?

norhh commented 1 year ago

Nevermind, Actually Mythril seems to detect getters at the return by matching the JUMPI address. There is some other issue with it.

norhh commented 1 year ago

Commenting out: 1) address token = address( new ArmorToken(_oracle, _name, _symbol) ); 2) arTokens.push(token); -> because token is defined in the above commented line. will net all the functions with just a single transaction. There is some bug with exploration due to the above lines.

norhh commented 1 year ago

@gsalzer , can you try it with the latest commit?

gsalzer commented 1 year ago

@norhh With STOP and RETURN as the pre-hooks, ./myth analyze -f ShieldController.hex -m Function --transaction-count 1 (also tried --strategy bfs, same result), returns the function names

constructor
controller() <== not quite clear where this comes from
fallback

beneficiary()
depositAmt()
getBalances(address,uint256,uint256)
getShields()
getTokens()
governor()
initialize(uint256,uint256,uint256)
isGov()
refFee()
shieldMapping(address)

but the following ones are still missing:

75b4d78c: bonus()
dc070657: changeBeneficiary(address)
c1ff808d: changeBonus(uint256)
1e761681: changeDepositAmt(uint256)
decc1a86: changeRefFee(uint256)
5b1dbd80: claimTokens(address,address,address)
baef5dfd: createShield(string,string,address,address,address,address,uint256[],address[])
10adc3f5: deleteShield(address,uint256)
c25dd27a: emitAction(address,address,address,address,uint256,uint256,bool)
1c74a301: receiveOwnership()
f2fde38b: transferOwnership(address)
norhh commented 1 year ago

You can increase the transaction count to 3 to cover more functions (20 in total), but 4 of the functions aren't covered like bonus()

gsalzer commented 1 year ago

@norhh I'm puzzled. Doesn't transaction-count specify the number of successive transactions issued to change the state (and trigger e.g. a vulnerability)? If there are more functions reported for a higher transaction count, then this seems to be rather an incidental by-product and not really expected behavior. Moreover, so far I understood laser as a general-purpose engine for symbolic execution, taint analysis etc. Taken together, Mythril should cover all functions within one transaction, otherwise there is a bug. Or is your point that Mythril actually reaches all the RETURNs and STOPs, but it is just the reporting of the function name that fails? In this case, I'd appreciate it if you could propose an expression that would return, at the RETURN/STOP, the first four bytes of the call data. Hooking the CALLDATA instruction and storing the four bytes for later is probably not such a good idea, as there may be nested calls. Maybe using the path condition and a call to the Z3 solver?

norhh commented 1 year ago

Let's take a function:

function withdrawMoney() {
    require(withdraw_flag == 1)
    transferEverything(address(msg.sender))
}

It's not possible to hit STOP for such functions in a single transaction, because a previous transaction has to set the withdraw_flag to 1. Another example would be:

function kill() {
    require(kill_switch == 1)
    selfdestruct(address(msg.sender))
}

Where the transaction only ends with SELFDESTRUCT when the kill_switch is triggered in a previous transaction.

Also, the issue with bonus() and controller() is that they have the same function JUMPDEST, which Mythril seems to get confused about for some reason.

gsalzer commented 1 year ago

Thanks for the explanation. It is definitely a valid argument in the case of source code/bytecode where Mythril may take the state of the contract, after initialization, into account.

But in the case of --bin-runtime, the withdraw_flag and the kill_switch should/must be handled symbolically (there is no state), so Mythril should reach the RETURN/STOP instruction while exploring the first transaction. However, the result is still the same as if analyzing creation bytecode, with the majority of functions missing.

norhh commented 1 year ago

You'll have to use --unconstrained-storage for it to happen in such a way.

gsalzer commented 1 year ago

Thanks regarding --unconstrained-storage. Now Mythril identifies almost all functions. Functions still missing:

75b4d78c: bonus()
baef5dfd: createShield(string,string,address,address,address,address,uint256[],address[])
c25dd27a: emitAction(address,address,address,address,uint256,uint256,bool)

Functions reported even though they are not there:

0x00000001 account_info_rotate_tine(uint256)
0xf77c4791 controller()
fallback

Do you have an explanation why Mythril reports 0x00000001 account_info_rotate_tine(uint256), a function that isn't anywhere in the contract? controller() occurs at least in a contract deployed by the main contract, but 0x00000001 appears nowhere at all.

norhh commented 1 year ago

Yes, there was the issue of bonus() being confused with controller() due to issue of them having same jumpdest. There should be some other similar bug for the rest.

aj3423 commented 1 year ago

So the easiest and an accurate solution is solving for the calldata when such information is required.

How to add a constraint to the calldata? For example constrain the 4-bytes-function-signature == 0x12345678 , I tried something like this but it doesn't work.

constraints.append(
    tx.call_data[0:4] == symbol_factory.BitVecVal(0xa9059cbb, 32)
)
norhh commented 1 year ago

Hi @aj3423, this is an example of how it is done: https://github.com/Consensys/mythril/blob/39b79cae8ab270dbbec12f7c91442af164acfa98/mythril/laser/ethereum/transaction/symbolic.py#L99-L101

aj3423 commented 1 year ago

@norhh Thanks, this is really helpful. Btw, maybe this loop can be optimised, currently it's: https://github.com/Consensys/mythril/blob/39b79cae8ab270dbbec12f7c91442af164acfa98/mythril/laser/ethereum/transaction/symbolic.py#L89-L102

It always generates 4 constraints even with a transaction sequence like [[-1]]:

[Or(False, 4 > 2_calldatasize), Or(False, 4 > 2_calldatasize), Or(False, 4 > 2_calldatasize), Or(False, 4 > 2_calldatasize)]

I think it should be:

    constraint = Bool(False)
    for func_hash in func_hashes:
        if func_hash == -1:
            # Fallback function without calldata
            constraint = Or(constraint, calldata.size < 4)
        elif func_hash == -2:
            # Receive function
            constraint = Or(constraint, calldata.size == 0)
        else:
            for i in range(FUNCTION_HASH_BYTE_LENGTH):
                constraint = Or(
                    constraint, calldata[i] == symbol_factory.BitVecVal(func_hash[i], 8)
                )
    constraints.append(constraint)