ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.28k stars 5.77k forks source link

Solidity provides invalid `start` in the source map #12915

Open dongmingh opened 2 years ago

dongmingh commented 2 years ago

Description

The value of start in the sourceMap is greater than the length of source file. The start is the byte offset to the start of the range in the source file. Hence this value should be less than the byte length of source file.

Environment

Steps to Reproduce

The steps to reproduce this issue are given in the truffle issue (https://github.com/trufflesuite/truffle/issues/4616). After stepping to 8240 with truffle debugger command ; 8239, use the following commands to see the issue:

  1. issue command :!<sourcemapping.current.instruction> to see the start of the first instruction is 28307

    debug(inline_config:0x1c796d1e...)> :!<sourcemapping.current.instruction>
    {
    pc: 0,
    name: 'PUSH1',
    pushData: '0x80',
    index: 0,
    jump: '-',
    start: 28307,
    length: 3774,
    file: 0,
    modifierDepth: 0,
    range: {
    start: {
      line: null,
      column: null
    },
    end: {
      line: null,
      column: null
    }
    }
    }
  2. Issue command :!<sourcemapping.current.sourceMap> to see the source map which shows the start of the first instruction is 28307

    debug(inline_config:0x1c796d1e...)> :!<sourcemapping.current.sourceMap>
    '28307:3774:0:-:0;;;;5:9:-1;2:2;;;27:1;24;17:12;2:2;28307:3774:0;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;12:1:-1;9;2:12;28367:53:0;;;:::i;:::-;;;;;;;;;;;;;;;;;;;
  3. issue command :!<sourcemapping.current.source>.source.length to see the length of source is 27370 which is less than the start in the source map above.

    debug(inline_config:0x1c796d1e...)> :!<sourcemapping.current.source>.source.length
    27370
cameel commented 2 years ago

We'll need more information to investigate this.

The contract that the transaction from https://github.com/trufflesuite/truffle/issues/4616 interacts with is not source verified (neither on etherscan nor on sourcify). Is the source available? If it's just the bytecode then this does not include a source map - where is Truffle getting that source map from?

Basically, we need the input that makes the compiler produce that invalid source map to get to the root of the problem. With the info given here I'm not even sure that this source map is actually coming from the compiler.

dongmingh commented 2 years ago

@cameel thanks for looking into this. The source file is available in etherscan. The solc version is v0.6.8+commit.0bbfe453 and optimization is enabled.

cameel commented 2 years ago

Thanks!

I investigated this and the reason seems to be that the source that etherscan is presenting is not the exact source that was used to generate the source map that's being shown there. It's a bit shorter. The difference must be in the comments or whitespace since it did not affect the bytecode but only the source map.

To check this you can verify the contract manually. Copy the source from etherscan to a file called ChiToken.sol and the bytecode to ChiToken.bin-runtime, then run:

curl --remote-name https://binaries.soliditylang.org/linux-amd64/solc-linux-amd64-v0.6.8+commit.0bbfe453
chmod +x solc-linux-amd64-v0.6.8+commit.0bbfe453

mkdir out/
./solc-linux-amd64-v0.6.8+commit.0bbfe453 \
    ChiToken.sol \
    --optimize \
    --bin \
    --combined-json srcmap-runtime \
    --output-dir out/

diff --color --unified \
    <(hexdump ChiToken.bin) \
    <(hexdump out/ChiToken.bin)

The diff shows that this does recreate the bytecode correctly (only the metadata hash at the very end differs). The source maps are different though.

If you try to diff them, you'll see that these source maps still have structure that's too similar for it to be just a coincidence. They both seem to match that bytecode. To see that, I recommend passing both through tr ';' '\n' command (which will replace semicolons with newlines to make it diffable) and using some visual diff tool. It clearly shows that there are many numbers that are identical and those that differ are mostly just shifted.

The similarity is even greater if you get the source submitted to sourcify.dev: ChiToken.sol. The only difference from the Etherscan source is that it does not have the Submitted for verification at Etherscan.io and Signed raw transaction for chainId comments so it's much shorter. The map that you get from it is oddly much closer to what etherscan is showing.

So the source map is probably correct, just created from a different input. I'm not sure if that's always the case for Etherscan or just some kind of bug in how their system fed that specific contract to the compiler. A workaround would be for Truffle to get the map from the compiler instead of relying on the one provided by Etherscan.

haltman-at commented 2 years ago

Jumping in briefly to provide some clarity here, since I realize things may not have been presented in an entirely clear manner here -- there definitely is a compiler issue here (although whether it's still present in current versions, who knows, given that so far we've only reproduced it on 0.6.x). It's true that the source we compile isn't exactly the same as the source downloaded from Etherscan, but that's irrelevant (or at least, it's irrelevant to the question of whether there's a compiler issue; it's certainly relevant to the question of reproducing it).

The reason it's irrelevant is that, while we download the source from Etherscan, we do not download the sourcemap from Etherscan. Rather, we compile the downloaded source and use the source maps output by the compiler. We do this precisely to avoid any potential mismatches. So, it is not possible that the source map is correct but for a different input, because we are getting it directly from the compilation of the input, not downloaded separately from Etherscan.

That does leave the problem of reproducing it in a manner not dependent on our tools. Obviously this is possible -- in the worst case, we could just hand you a standard JSON input file that exhibits the problem -- but I was hoping for something more convenient. Annoyingly my attempts at more convenient repro haven't worked out so far, but @dongmingh says he was able to recently able to confirm the problem via the standard JSON approach, so, uh, we ought to have something for you soon that demonstrates that there is a real compiler issue here (or was, anyway; that still leaves the question of whether it still exists in current versions).

Btw, if you're wondering how exactly the source we compile might differ from the source submitted to Etherscan -- not that it's relevant, but if you're wondering:

  1. For single-source cases (like this one), we do include a date header comment header, but, because we don't have a good way to get the date, we just use "20XX-XX-XX" in place of a more specific date.
  2. Any linebreaks in the source we compile will be Windows-style linebreaks... except the ones in the date header, which will be system-style.

Those should be the only two differences. Also, IIRC, in the case where the source map pointed past the end, it pointed way past the end, so even if we were using the wrong source map I don't think that would be sufficient to explain the discrepancy (although I'm going by memory here so it's possible I'm mistaken).

haltman-at commented 2 years ago

OK, sorry about taking a while to get back to this. I've put up a gist here with the exact standard JSON input that's causing the problem; I just had Truffle log the exact string it's sending to the compiler. I'm pretty confused about why I've been unable to reproduce this any other way. Apologies for how inconvenient and non-minimal this example is, not to mention what an old compiler version it's on; but so far I've been utterly unable to reproduce this any other way. Hoping this is helpful all the same.

github-actions[bot] commented 1 year ago

This issue has been marked as stale due to inactivity for the last 90 days. It will be automatically closed in 7 days.

eggplantzzz commented 1 year ago

So I think I discovered another tx that has a bad source map which is reported in this issue.

ibollanos commented 1 year ago

We (Dedaub) have also reviewed this issue because it seemed relevant to a few inconsistencies we were noticing in our internal tooling. We do not believe that this is a compiler bug. This seems to be related to how the Truffle debugger interpreted the information conveyed by the source mappings.

The main point to be made is that both the s and l fields are byte offsets. Implementations that treat those fields as character offsets will break whenever a source file contains multi-byte UTF characters. The contract that is mentioned at the beginning of the issue has quite a few of them inside the comment at the top of its source.

It seems that Truffle has addressed this with this commit: https://github.com/trufflesuite/truffle/commit/8e27d5c4f9412729503640a2e28fb39b6f5b3cb6

However, Truffle seems to be still treating source mapping fields as character offsets in some areas of their code, which is why contracts with multi-byte characters are still not displayed properly in their debugger. Ideally, the check should be range.start >= Buffer.from(source).length