Open tuturu-tech opened 6 months ago
Example call sequence and contract:
{
"call": {
"from": "0x0000000000000000000000000000000000010000",
"to": "0xa647ff3c36cfab592509e13860ab8c4f28781a66",
"nonce": 1,
"value": "0x0",
"gasLimit": 12500000,
"gasPrice": "0x1",
"gasFeeCap": "0x0",
"gasTipCap": "0x0",
"data": "0xe8f0d2db0000000000000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000005f65fc28be74620a5d4449a5b21393f93ac0e9089234a90b704852792227fa270ba3929fe4153840f9063d73e2c0518fd91b0dacd93edd00a47633da9ec576747da9b64ebafae9cbf6a2af2669c11669989c434713f73272573b326f98bb0ceb00",
"dataAbiValues": {
"methodSignature": "check_specific_string(string)",
"inputValues": [
"e\ufffd(\ufffdtb\n]DI\ufffd\ufffd\u0013\ufffd\ufffd:\ufffd\ufffd\b\ufffd4\ufffd\u000bpHRy\"'\ufffd'\u000b\ufffd\ufffd\ufffd\ufffd\u00158@\ufffd\u0006=s\ufffd\ufffdQ\ufffd\ufffd\u001b\r\ufffd\ufffd\u003e\ufffd\u0000\ufffdv3ڞ\ufffdvt}\ufffd\ufffdN\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0026i\ufffd\u0016i\ufffd\ufffdCG\u0013\ufffd2rW;2o\ufffd\ufffd\f\ufffd"
]
},
"AccessList": null,
"SkipAccountChecks": false
},
"blockNumberDelay": 37521,
"blockTimestampDelay": 390361
},
pragma solidity ^0.8.0;
// medusa fuzz --compilation-target . --target-contracts "Nonutf" --corpus-dir corpus-utf
contract Nonutf {
function check_specific_string(string memory provided) public {
require(bytes(provided).length > 0);
if (keccak256(bytes(provided)) == keccak256(bytes(unicode"00FC"))) {
assert(false);
}
}
}
If we take the encoded data (ignoring offset and length) and convert it to string via chisel we can see it contains non-utf8 characters:
➜ bytes memory test = hex"65fc28be74620a5d4449a5b21393f93ac0e9089234a90b704852792227fa270ba3929fe4153840f9063d73e2c0518fd91b0dacd93edd00a47633da9ec576747da9b64ebafae9cbf6a2af2669c11669989c434713f73272573b326f98bb0ceb00"
➜ string(test)
Type: string
├ UTF-8: e�(�tb
]DI����:��4�
pHRy"'�'
�>��v3ڞ�vt}��N�������&i�i��CG�2rW;2o��
�
Since Solidity isn't limited to utf8 I think it's fine to keep non-utf8 encoding, but tools might have a hard time parsing the decoded arguments the way they are currently. Using some encoding format that can be easily converted to unicode might be better.
I think we can just remove the call to QuoteToASCII
here.https://github.com/crytic/medusa/blob/6750032502ed64952435dc408be3d8a1a107eb5c/fuzzing/valuegeneration/abi_values.go#L444
It does not seem like the decoding unquotes it
https://github.com/crytic/medusa/blob/6750032502ed64952435dc408be3d8a1a107eb5c/fuzzing/valuegeneration/abi_values.go#L792-L797
Currently
string
types are just byte arrays which don't seem to adhere to any specific encoding, this makes it difficult to correctly decode strings from corpus call sequences in tools such asfuzz-utils
. Using a standard encoding such as utf-8 would ensure other tools can correctly process the corpus.