crytic / slither

Static Analyzer for Solidity and Vyper
https://blog.trailofbits.com/2018/10/19/slither-a-solidity-static-analysis-framework/
GNU Affero General Public License v3.0
5.35k stars 970 forks source link

[Bug-Candidate]: Consider non-ASCII when source mapping #1164

Open ghost opened 2 years ago

ghost commented 2 years ago

Describe the issue:

Hey!

Just noticed that source_mapping will be misaligned when a .sol file containing non-ASCII is opened. Maybe "misaligned" isn't the correct phrasing as it depends on how the .sol file is opened in Python. If I open like in Slither and Crytic compile, a normal read with encoding=utf-8 it will be misaligned, but it works fine reading it as bytes (rb).

Not sure if this has any implication on Slither itself, but might be handy to know when trying to slice the source based on Slither output.

Code example to reproduce the issue:

from slither.slither import Slither

# Write simple test contract with non-ASCII
with open('test.sol', 'w') as sol:
    sol.write(
        ''' 
        // 有趣的  // <- will cause shift with utf-8 read as in bytes this is \xe6\x9c\x89\xe8\xb6\xa3\xe7\x9a\x84
        contract A {

            uint public x;

            constructor() public {
                x = 1;
            }

        }
        '''
    )

# Parse with Slither
slither = Slither('test.sol')

# Open file with utf-8 as done in Slither and print contract positions
with open('test.sol', encoding='utf-8') as sol_file:
    source_code = sol_file.read()
    contract_mapping = slither.contracts[0].source_mapping
    start, end = contract_mapping['start'], contract_mapping['start'] + contract_mapping['length']
    print(source_code[start:end])

Will print the incorrect part of the source code:

ct A {

            uint public x;

            constructor() public {
                x = 1;
            }

        }

Can change it to binary read and then decode and it looks fine:

with open('test.sol', 'rb') as sol_file: # read binary
    source_code = sol_file.read()
    contract_mapping = slither.contracts[0].source_mapping
    start, end = contract_mapping['start'], contract_mapping['start'] + contract_mapping['length']
    print(source_code[start:end].decode('utf-8')) # decode binary

Will print:

contract A {

            uint public x;

            constructor() public {
                x = 1;
            }

        }

Version:

Slither: 0.8.2

Relevant log output:

No response

montyly commented 2 years ago

Hi @CodeGodz. That's a good catch, thanks for reporting it.

It looks like you're right and we should change the encoding in both Slither and crytic-compile