crytic / medusa

Parallelized, coverage-guided, mutational Solidity smart contract fuzzing, powered by go-ethereum
https://www.trailofbits.com/
GNU Affero General Public License v3.0
292 stars 37 forks source link

Adjust corpus call sequence encoding of `string` types so that they are utf-8 compatible #339

Open tuturu-tech opened 6 months ago

tuturu-tech commented 6 months ago

Currently string types are just byte arrays which don't seem to adhere to any specific encoding, this makes it difficult to correctly decode strings from corpus call sequences in tools such as fuzz-utils. Using a standard encoding such as utf-8 would ensure other tools can correctly process the corpus.

tuturu-tech commented 3 months ago

Example call sequence and contract:

 {
  "call": {
   "from": "0x0000000000000000000000000000000000010000",
   "to": "0xa647ff3c36cfab592509e13860ab8c4f28781a66",
   "nonce": 1,
   "value": "0x0",
   "gasLimit": 12500000,
   "gasPrice": "0x1",
   "gasFeeCap": "0x0",
   "gasTipCap": "0x0",
   "data": "0xe8f0d2db0000000000000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000005f65fc28be74620a5d4449a5b21393f93ac0e9089234a90b704852792227fa270ba3929fe4153840f9063d73e2c0518fd91b0dacd93edd00a47633da9ec576747da9b64ebafae9cbf6a2af2669c11669989c434713f73272573b326f98bb0ceb00",
   "dataAbiValues": {
    "methodSignature": "check_specific_string(string)",
    "inputValues": [
     "e\ufffd(\ufffdtb\n]DI\ufffd\ufffd\u0013\ufffd\ufffd:\ufffd\ufffd\b\ufffd4\ufffd\u000bpHRy\"'\ufffd'\u000b\ufffd\ufffd\ufffd\ufffd\u00158@\ufffd\u0006=s\ufffd\ufffdQ\ufffd\ufffd\u001b\r\ufffd\ufffd\u003e\ufffd\u0000\ufffdv3ڞ\ufffdvt}\ufffd\ufffdN\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0026i\ufffd\u0016i\ufffd\ufffdCG\u0013\ufffd2rW;2o\ufffd\ufffd\f\ufffd"
    ]
   },
   "AccessList": null,
   "SkipAccountChecks": false
  },
  "blockNumberDelay": 37521,
  "blockTimestampDelay": 390361
 },
pragma solidity ^0.8.0;

// medusa fuzz --compilation-target . --target-contracts "Nonutf" --corpus-dir corpus-utf
contract Nonutf {
    function check_specific_string(string memory provided) public {
        require(bytes(provided).length > 0);
        if (keccak256(bytes(provided)) == keccak256(bytes(unicode"00FC"))) {
            assert(false);
        }
    }
}

If we take the encoded data (ignoring offset and length) and convert it to string via chisel we can see it contains non-utf8 characters:

➜ bytes memory test = hex"65fc28be74620a5d4449a5b21393f93ac0e9089234a90b704852792227fa270ba3929fe4153840f9063d73e2c0518fd91b0dacd93edd00a47633da9ec576747da9b64ebafae9cbf6a2af2669c11669989c434713f73272573b326f98bb0ceb00"
➜ string(test)
Type: string
├ UTF-8: e�(�tb
]DI����:��4�
            pHRy"'�'
�>��v3ڞ�vt}��N�������&i�i��CG�2rW;2o��
                                      �

Since Solidity isn't limited to utf8 I think it's fine to keep non-utf8 encoding, but tools might have a hard time parsing the decoded arguments the way they are currently. Using some encoding format that can be easily converted to unicode might be better.

0xalpharush commented 3 months ago

I think we can just remove the call to QuoteToASCII here.https://github.com/crytic/medusa/blob/6750032502ed64952435dc408be3d8a1a107eb5c/fuzzing/valuegeneration/abi_values.go#L444 It does not seem like the decoding unquotes it https://github.com/crytic/medusa/blob/6750032502ed64952435dc408be3d8a1a107eb5c/fuzzing/valuegeneration/abi_values.go#L792-L797