ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.28k stars 5.76k forks source link

Code/contractdata data location #13723

Open NunoFilipeSantos opened 1 year ago

NunoFilipeSantos commented 1 year ago

What

Introduce code or contractdata as a new data location for data stored in the deployed bytecode as an extension of the current immutable mechanism.

Why

Lifting the restrictions of the current immutable mechanism and transfer the merits to dynamic types.

How

Notes


Resources

  1. 13323

nventuro commented 1 year ago

Will immutable be deprecated? I strongly suggest adding more keywords. Given constant already exists (and causes confusion with immutable), if you prefer code over immutable I'd just replace the current usage with that.

ekpyron commented 1 year ago

Yes, we will transition to one keyword for all of it, the question is rather what it will be. I was rather happy with code until EVM-level discussions about deprecating code introspection and codecopy started, which may lead to codecopy being replaced by a more restricted datacopy in a mid-term future evm version - which weakens the justification for using code.

My plan was to open the discussion about the keyword to use for this now and try to settle this asap.

ekpyron commented 1 year ago

The main advantage of just sticking with immutable would be that we probably wouldn't need a breaking change then (for any new keyword we'll probably need that).

The main question for that is whether it's only me to whom stuff like this looks very off:

contract C {
  uint[] immutable x;
  uint[] immutable y;
  function f() public {
    uint[] immutable immutableRef = x;
    immutableRef = y;
    ...
  }
}
cameel commented 1 year ago

Some random ideas for the keyword that would avoid tying it to a specific section in the bytecode:

ekpyron commented 1 year ago

Hm... not sure any of them really convince me right away :-).

It may actually be worth a thought what happened, if we instead actually just weakened the restrictions on what's constant. Whether a constant is de-facto compile-time constant can always be determined on a case-by-case basis when evaluating the expression - and any use of a constant that's only known after the constructor in a context that requires to know the value at compile time can error out on that use only - so it could in principle be done. For some reason uint[] constant ref = x; ref = y; looks less weird to me than for immutable, not exactly sure why...

That'd be straight the opposite direction of @nventuro asking for using more keywords instead, though :-). Would reusing constant really be confusing here though?

ekpyron commented 1 year ago

constdata or deploydata or something like that may be options... but ugly... For the record, the opcode that may replace codecopy in EOF would be datacopy which would allow only to read from a specific data section defined during deployment, so it'd still be a specific section of the deployed bytecode technically... but data really doesn't make for a good keyword :-).

cameel commented 1 year ago

Not sure I like the idea of relaxing constant. Being able to change constants by mistake in init code does not sound like a feature to me :) Sometimes you just want to say something will never change.

d-xo commented 1 year ago

Just want to note here that allowing dynamically sized immutable data into the runtime bytecode will be challenging to deal with from a formal analysis perspective, and that my life (as an author of a symbolic execution engine) would be made easier if this would wait until after EOF (where the proper separation of code and data would make this much easier to model).

ekpyron commented 1 year ago

@d-xo The problem is that currently nobody knows if and when EOF will actually happen :-). I take it, you want to be able to analyze bytecode even without having sources available, i.e. providing the separation of code and data as compiler output artifact doesn't help you that much? Would it help if we defined better formal guarantees for distinguishing data from code in bytecode? E.g. we could define a recursive visitation algorithm starting from the entry point that would be guaranteed to cover everything that's code, leaving only data (not sure there's currently a reliable way to do this in all corner cases that would mess with function pointers in inline assembly)? This would depend on your exact requirements in any case, since nested code for contract creation can still occur indistinguishably from purely-non-code data (even in EOF, unless the latest "no code introspection" changes to EOF are accepted)...

ekpyron commented 1 year ago

contractdata may be an option