ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.29k stars 5.77k forks source link

use before assignment of calldata struct instance inside a function does not throw an error #15483

Open haoyang9804 opened 1 month ago

haoyang9804 commented 1 month ago

Description

The following program is a trivially correct one. I copy data from calldata to memory.

contract A {
  struct S {
    int a;
  }
  function f (S calldata s) public pure {
    S memory s3;
    s3 = s;
  }
}

Then I mutate this test program into the following:

contract A {
  struct S {
    int a;
  }
  function f (S calldata s) public pure {
    S calldata s2;
    S memory s3;
    s3 = s2; // fail: This variable is of calldata pointer type and can be accessed without prior assignment, which would lead to undefined behaviour.
  }
}

This test program causes an error, saying that This variable is of calldata pointer type and can be accessed without prior assignment, which would lead to undefined behaviour.. This is understandable. calldata is used to receive data from other contracts. So the calldata in a function without initialization is empty and should be initialized first.

I continue the mutation by initializing s2 first like the below and it passed the compilation.

contract A {
  struct S {
    int a;
  }
  function f (S calldata s) public pure {
    S calldata s2 = s;
    S memory s3;
    s3 = s2; // pass
  }
}

Now I wonder if the compiler can find an incorrect initialization of a calldata slot. So I initialize the calldata slot with itself, an initialized calldata slot, to get the following program:

contract A {
  struct S {
    int a;
  }
  function f (S calldata s) public pure {
    S calldata s2;
    s2 = s2;
    S memory s3;
    s3 = s2; // pass
  }
}

Interestingly, the above test program does not trigger an expected error with message such as This variable is of calldata pointer type and can be accessed without prior assignment, which would lead to undefined behaviour. but passed the compilation.

Environment

Steps to Reproduce

Just compile the above programs and you will reproduce the compilation results.

ekpyron commented 1 month ago

This is somewhat intentional as a deliberate workaround to allow complex initialization patterns. I.e. the following is correct in the sense that there can never be an uninitialized access, but it's in general infeasible to detect this at compile-time:

contract C {
    struct A {
        uint256 x;
    }
    struct B {
        A a1;
        A a2;
    }
    function f(B calldata b, bool c) external pure returns (uint256) {
        A calldata a;
        if (c) {
            a = b.a1;
        }
        if (!c) {
            a = b.a2;
        }
        return a.x;
    }
}

Which is why deliberately assigning to self can be used to work around the detection, i.e. this works as deliberate workaround:

contract C {
    struct A {
        uint256 x;
    }
    struct B {
        A a1;
        A a2;
    }
    function f(B calldata b, bool c) external pure returns (uint256) {
        A calldata a;
        a = a;
        if (c) {
            a = b.a1;
        }
        if (!c) {
            a = b.a2;
        }
        return a.x;
    }
}

Now the question is whether allowing this mode of workaround is generally a good thing - the idea was that it's rather unlikely that an assignment to self happens unintentionally and is thus safe to allow for allowing otherwise valid patterns through analysis. But since this is a known workaround against this kind of analysis, we'd likely need to consider changing the behavior here a breaking change.

haoyang9804 commented 1 month ago

I see. Thanks for the clarification.

ekpyron commented 1 month ago

Not sure if this is specifically documented, but what happens for c ? a : b is that you need a common type between a and b - which is the case just if a and b have the same type - or if a's type is implicitly convertible to the type of b - or if b's type is implicitly convertible to the type of a. In the latter two cases, the respective implicit conversion will happen and the ternary gets the "common type".

So e.g. for a in memory and b in calldata, c ? a : b will be in memory and b will be copied from calldata to memory.

haoyang9804 commented 1 month ago

Not sure if this is specifically documented, but what happens for c ? a : b is that you need a common type between a and b - which is the case just if a and b have the same type - or if a's type is implicitly convertible to the type of b - or if b's type is implicitly convertible to the type of a. In the latter two cases, the respective implicit conversion will happen and the ternary gets the "common type".

So e.g. for a in memory and b in calldata, c ? a : b will be in memory and b will be copied from calldata to memory.

Following this understanding, I found an issue:

contract C{
  struct S {
    int a;
  }
  S st;
  function f(S calldata cd) public {
    st = cd; // pass, S calldata -> S storage
    true ? st : cd; // fail: True expression's type struct C.S storage pointer does not match false expression's type struct C.S calldata.
  }
}

Does it reveal the inconsistency of type checking?

ekpyron commented 2 weeks ago

That's a tricky example :-). st in st = cd, i.e. on the left-hand-side of an assignment, is a storage reference - assigning to it will trigger a deep copy from cd to st. However st in true ? st : cd (so in right-hand-side context) is a storage pointer. And there is no conversion from S calldata -> S storage (pointer).

You can also see the following fail:

contract C{
  struct S {
    int a;
  }
  S st;
  function f(S calldata cd) public {
    S storage sp = st;
    sp = cd;
  }
}

If that passed, then c ? st : cd would also be expected to pass - but it doesn't.

Now that's the technical explanation in terms of the type system. But on a higher-level, it would be very weird behaviour if c ? st : cd was allowed and typed as S storage. The only way to interpret that semantically would be that if c is false, it would be a storage pointer to st, but then what happens if c is true? You need a storage pointer then as well - where should we point? You certainly won't expect cd to be copied to st in that case.

The only way to make c ? st : cd work semantically would be to copy both arguments to memory - but the current typing rules don't allow for that, and for good reason, since you're not guaranteed that each pair of types has a unique type that both are implicitly convertible to.

haoyang9804 commented 2 weeks ago

That's a tricky example :-). st in st = cd, i.e. on the left-hand-side of an assignment, is a storage reference - assigning to it will trigger a deep copy from cd to st. However st in true ? st : cd (so in right-hand-side context) is a storage pointer. And there is no conversion from S calldata -> S storage (pointer).

You can also see the following fail:

contract C{
  struct S {
    int a;
  }
  S st;
  function f(S calldata cd) public {
    S storage sp = st;
    sp = cd;
  }
}

If that passed, then c ? st : cd would also be expected to pass - but it doesn't.

Now that's the technical explanation in terms of the type system. But on a higher-level, it would be very weird behaviour if c ? st : cd was allowed and typed as S storage. The only way to interpret that semantically would be that if c is false, it would be a storage pointer to st, but then what happens if c is true? You need a storage pointer then as well - where should we point? You certainly won't expect cd to be copied to st in that case.

The only way to make c ? st : cd work semantically would be to copy both arguments to memory - but the current typing rules don't allow for that, and for good reason, since you're not guaranteed that each pair of types has a unique type that both are implicitly convertible to.

Really thanks for your patient clarification. It helps a lot for my building the semantics model of Solidity to generate Solidity programs.