Just noticed that source_mapping will be misaligned when a .sol file containing non-ASCII is opened. Maybe "misaligned" isn't the correct phrasing as it depends on how the .sol file is opened in Python. If I open like in Slither and Crytic compile, a normal read with encoding=utf-8 it will be misaligned, but it works fine reading it as bytes (rb).
Not sure if this has any implication on Slither itself, but might be handy to know when trying to slice the source based on Slither output.
Code example to reproduce the issue:
from slither.slither import Slither
# Write simple test contract with non-ASCII
with open('test.sol', 'w') as sol:
sol.write(
'''
// 有趣的 // <- will cause shift with utf-8 read as in bytes this is \xe6\x9c\x89\xe8\xb6\xa3\xe7\x9a\x84
contract A {
uint public x;
constructor() public {
x = 1;
}
}
'''
)
# Parse with Slither
slither = Slither('test.sol')
# Open file with utf-8 as done in Slither and print contract positions
with open('test.sol', encoding='utf-8') as sol_file:
source_code = sol_file.read()
contract_mapping = slither.contracts[0].source_mapping
start, end = contract_mapping['start'], contract_mapping['start'] + contract_mapping['length']
print(source_code[start:end])
Will print the incorrect part of the source code:
ct A {
uint public x;
constructor() public {
x = 1;
}
}
Can change it to binary read and then decode and it looks fine:
with open('test.sol', 'rb') as sol_file: # read binary
source_code = sol_file.read()
contract_mapping = slither.contracts[0].source_mapping
start, end = contract_mapping['start'], contract_mapping['start'] + contract_mapping['length']
print(source_code[start:end].decode('utf-8')) # decode binary
Will print:
contract A {
uint public x;
constructor() public {
x = 1;
}
}
Describe the issue:
Hey!
Just noticed that
source_mapping
will be misaligned when a.sol
file containing non-ASCII is opened. Maybe "misaligned" isn't the correct phrasing as it depends on how the.sol
file is opened in Python. If I open like in Slither and Crytic compile, a normal read withencoding=utf-8
it will be misaligned, but it works fine reading it as bytes (rb
).Not sure if this has any implication on Slither itself, but might be handy to know when trying to slice the source based on Slither output.
Code example to reproduce the issue:
Will print the incorrect part of the source code:
Can change it to binary read and then decode and it looks fine:
Will print:
Version:
Relevant log output:
No response