Open nventuro opened 2 years ago
love the code sample
Since it's really not dynamic array, better syntax would be IERC20[10] internal immutable _tokens;
Thanks for the issue - this has been on our longer term agenda ever since we introduced immutables :-) (I'm actually surprised that it wasn't requested much earlier ;-)). At least I hadn't considered a maximal length for such dynamic immutable arrays - that's an interesting idea that could make this easier (but I could also imagine that fully dynamic arrays would not be that much harder actually).
One thing to be aware of is that this will not be as cheap as value-type immutables, because we won't be able to inline the values directly - in most cases I'd expect the values to end up being codecopy
d to memory before use, which has some overhead (although it will be significantly cheaper than storage of course).
In retrospect, I'd much rather have introduced what we now have as immutable
s as a new data location called code
(besides memory
, calldata
and storage
), which would convey more clearly what actually happens and would make it less of a concern to allow multiple assignments during construction (there's no technical reason for disallowing that - during construction the values just live in memory - it's just contrary to the meaning of "immutable"...)
(but I could also imagine that fully dynamic arrays would not be that much harder actually).
I haven't given this much thought, so I can't quite comment on complexity of such an implementation. In all scenarios where I'd use this however, a maximum length (which also means there's some wasted bytecode on the unused slots) is good enough.
One thing to be aware of is that this will not be as cheap as value-type immutables, because we won't be able to inline the values directly - in most cases I'd expect the values to end up being codecopyd to memory before use, which has some overhead (although it will be significantly cheaper than storage of course).
Yes, while the bytecode size for immutable variables in my example would be the same, the actual code that accesses them would not (both size and gas would differ). I imagine instead of a push we'd just have codecopy + mload (with no need to expand allocated memory), which would use very little gas.
(except for .length, which would be a push)
Just would like to bring up a particular use case of mapping to an immutable array like in mapping(uint256=>uint256[10] immutable) data
. This may find applications in ordered NFT batch standards.
Hm, I'm not quite sure how you'd do mappings as they are currently extremely storage based (since they key is just a 256 bit hash, there's no notion of the 'capacity' of the map). But you'd likely be able to implement such a thing yourself using multiple immutable arrays (a few with the actual data, and one that tells you which mapping to use), though it'd require some boilerplate similar to the one in my code sample, even if this feature was implemented.
I'd limit discussion here to the feature discussed in the opening post.
Hm, I'm not quite sure how you'd do mappings as they are currently extremely storage based (since they key is just a 256 bit hash, there's no notion of the 'capacity' of the map).
I think it's similar to mapping(uint256=>uint256[10]) data;
, except the values in a mapped array cannot be modified.
But you'd likely be able to implement such a thing yourself using multiple immutable arrays (a few with the actual data, and one that tells you which mapping to use),
Is this an immutable struct?
Hi @ekpyron! I keep running into scenarios where such a feature would be extremely useful :) Is there an estimated timeline for this feature? Is there any work the community can do to push it forward (e.g. come up with a more detailed spec, begin implementation work, etc.)?
@nventuro Unfortunately, there hasn't been progress on this and we don't have an estimated timeline for it. For now I moved it up in our design backlog, which will hopefully lead to us discussing it in one of our calls next week.
The main problem is that the implementation of this won't be trivial and I guess probably either @chriseth or I would have to do it and I'm not sure when we'll find the time.
Conceptually, the main question I think is whether we want to enforce an "assign once and do not read" logic for dynamic immutable types like we do for immutables of value type.
It would probably in fact be easier to implement this, if we were to treat the reference-type immutable just like a memory
variable during the constructor and in that sense allow reassignments and reads from it. However, it'd be counterintuitive for them to be called immutable
while they can be reassigned, even if only during creation (@chriseth objected that to such plans in the past IIRC).
We just discussed this in our design call and decided to make it a goal for Q2 2022.
Current rough sketch of a plan is:
bytes
only and then extendimmutables
, but introduce a data location code
for them that can be passed around by referencecode
variables during construction will behave just like memory
variables, i.e. they will be normally writable and readable in memory during contract creation.However, we will probably have some iterations on this plan as we start implementing this.
The code
part sounds very interesting! Looking forward to how this turns out, it'll likely be the tipping point that makes me migrate to 0.8 :grin:
A code
location would be an awesome addition. I was going to say before that "assign once and do not read" is not the crucial part that people care about. It's all about the efficiency of access.
Here's an extreme example: 8 sets of 30 immutable values each that could be implemented as code/immutable arrays using this feature.
How would the immutable values be stored in the code? Having dynamically sized data in the middle of the bytecode would make symbolic analysis of such bytecode even more annoying (perhaps even impossible?) than it already is.
Here are some things that are already difficult to deal with when symbolically executing (given the existance of dynamic constructor args and static immutables):
These can currently be worked around as long as dynamic data exists at the end of the bytecode, in this case we can produce an index beyond which all bytecode is known to be a symbolic value, and error out if we attempt to execute past this index (symbolic execution of symbolic opcodes is not a tractable problem). I'm not sure how I would handle this if code could contain arbitrary pieces of symbolic data with an unknown length at any location in the code.
side note: EIP-3540 cannot come soon enough...
I'd say that the immutable would definitely be stored after the opcodes.
Hm... thinking about actually implementing this, it may be a problem to parse a code
data location without making code
a proper keyword, which would be breaking... that's a bit unfortunate...
In that case could the keyword be immutable
?
In that case could the keyword be
immutable
?
That's a bit weird for a full proper data location...
contract C {
uint[] immutable x;
uint[] immutable y;
function f() public {
uint[] immutable z = x;
z = y;
}
}
immutable
looks just wrong for references you can reassign and pass around :-).
But I may be able to work around the parsing problem, if we restrict to arrays only... so that would mean that the first version would only allow the new code
data location for array types, not for structs and we'd only add structs for 0.9 with code
being a proper keyword... but I still need to check if that works...
immutable looks just wrong for references you can reassign and pass around :-).
If we had a proper reference syntax we wouldn't need this new code
keyword, methinks.
immutable looks just wrong for references you can reassign and pass around :-).
If we had a proper reference syntax we wouldn't need this new
code
keyword, methinks.
Can you elaborate on that? Reference syntax and data locations are pretty orthogonal.
The only reason we need code
data location in that statement, because otherwise it is unclear where it should be. "Is it a copy, is it a reference?"
The only reason we need
code
data location in that statement, because otherwise it is unclear where it should be. "Is it a copy, is it a reference?"
I honestly don't understand what you're saying :-). The code
data location would distinguish state variables in storage from variables in the data section of the contract code - and distinguish references of things in code from references to things in storage. I see no relation to the question of references or copies whatsoever here.
Would love to see code
location modifier implemented!
Any chance to have it implemented?
Yes. It's on our roadmap, planned for Q3: #13723.
By the way: we may want to reconsider the name after all... it's not unlikely that one of the upcoming EVM upgrades will move from codecopy
to datacopy
, which robs us of the justification of the name :-).
Not that it's the most important holdup here. But open for other suggestions.
Still not a fan of keeping to call it immutable
due to local reference variables looking strange that way.
@ekpyron I would propose to use naming codedata
because of it’s similarity to calldata
and same length of the keyword.
Will we be able to use this keyword for constructor arguments?
I don't think it makes sense for arguments of constructors or external functions. What would that even mean?
Currently constructor arguments can be memory-only. I thought it could be code
Technically it could, but I don't see any benefit in that. I.e. this would mean that we reserve space for them in the memory area where the constructor is building the runtime code and copy them there from calldata. This seems pointless because the runtime code would not be able to access them (the memory is there but you cannot access constructor's parameter outside of the constructor). You could achieve the same thing explicitly by declaring an immutable and copying the constructor parameter there yourself.
Currently constructor arguments can be memory-only.
Interesting. I actually did not know about that that limitation and I don't see why it has to be there. In #9514 it looks like the thing being fixed was parameters without location specified at all and there's no explanation why calldata was excluded. It was done in 0.7.0, which was the time when calldata parameters were already allowed anywhere.
Could be that it was just considered a corner case not worth handling - in the constructor we have the whole contract bytecode in calldata so the code finding the parameters has to account for that. @chriseth do you remember the context for that change?
@cameel as far as I can see in constructor CALLDATASIZE()
is zero, I believe all the byte code is located in code
, not in calldata
.
During contract creation you have a copy of the init code (including constructor arguments) in calldata, so code
and calldata
will contain the same data. And since calldata
is more efficient than code
due to having calldataload
, it'd make more sense to make them calldata
really (unless you want to reuse functions expecting code
variables)... the main reason why we don't allow constructor arguments to actually be calldata
arguments is that that'd require some custom logic, since the arguments are at "weird" offsets (past the initcode, not at offset zero), and that having them in memory makes it easier to have a derived contract pass down on-the-fly-generated data to a base constructor (however, there's no reason for just only allowing to pass unchanged calldata further to base constructors in these cases, similar to passing calldata arguments to other public functions). So there's no real blocker against this.
So yes, generally both allowing code
as well as calldata
as locations for constructor arguments makes sense - the latter could be done even before the code
data location is implemented.
It mainly hasn't been done due to time-constraints and lack-of-demand so far. If you're interested in calldata
constructor arguments, you can create a separate issue for it and make a case for their utility!
@ekpyron, strange, I see msg.data.length
is zero in constructor (and same for assembly calldatasize
):
contract A {
event Log(uint256);
constructor(uint256 a, bytes memory b) {
emit Log(msg.data.length);
}
}
=>
{
"from": "0xf8e81D47203A594245E36C48e151709F0C19fBe8",
"topic": "0x909c57d5c6ac08245cf2a6de3900e2b868513fa59099b92b27d8db823d92df9c",
"event": "Log",
"args": {
"0": "0"
}
}
@k06a Haha, that's fascinating, you're right! I didn't actually check the spec on this, but evmone, the execution engine we use for testing, does apparently implement this with incorrect behaviour, I wouldn't have thought that possible. I'll file an issue with them - they should pass the consensus tests, so I'm a bit puzzled about how this can be.
In any case, at least code
as data location for them will definitely be possible in the future.
@ekpyron what do you think about idea to introduce new EIP for CODELOAD(OFFSET)
operation to extract 32 bytes of code data to stack? This could be really helpful for data stored in code/immutable.
@ekpyron what do you think about idea to introduce new EIP for
CODELOAD(OFFSET)
operation to extract 32 bytes of code data to stack? This could be really helpful for data stored in code/immutable.
Yes, that would definitely be helpful - but it's hard these days to get an EIP accepted and live.
There is still a chance that EOF may happen eventually - the latest EOF spec does have both a stack-based DATALOAD(offset)
and an even cheaper immediate-argument DATALOADN
(which would be used to implement the current immutable mechanism).
For reference, this is the latest version of the EOF spec I'm aware of: https://notes.ethereum.org/@ipsilon/B1nnZ1fl3
It'd be quite nice if all of that actually went live at some point - it would make all contracts noticeably cheaper and have quite nice analysis properties - but the EIP process is complicated, straining and unpredictable, I wouldn't wager to bet on the chances nor the ETA at this point.
I see CODELOAD(offset)
/DATALOAD(offset)
is equivalent to the following code:
let tmp := mload(0) // Store the value of memory[0]
codecopy(0, offset, 32) // Copy bytecode 32 bytes to memory[0]
let result := mload(0) // Reading 32 bytes to stack
mstore(0, tmp) // Restore the value of memory[0]
I see
CODELOAD(offset)
/DATALOAD(offset)
is equivalent to the following code:let tmp := mload(0) // Store the value of memory[0] codecopy(0, offset, 32) // Copy bytecode 32 bytes to memory[0] let result := mload(0) // Reading 32 bytes to stack mstore(0, tmp) // Restore the value of memory[0]
Yes - since we reserve scratch space, we could in practice even implement it as a plain
codecopy(0, offset, 32)
let result := mload(0)
Still quite a bit away from the presumed 3 gas a codeload(offset)
/dataload(offset)
would cost. But it's a good enough workaround as long as we don't have any additional opcodes.
@ekpyron if you would try to simulate this instruction in Yul/Assembly, you would definitely need to backup and restore memory[0:32]
.
We reserve the memory between 0 and 0x3f for temporary calculations, like for hashing (the keccak opcode is also only defined for memory, and storage slot calculation often involve hashing of single words). So we generally don't rely on the contents of the lowest two words of memory retaining their values long term. User-defined assembly also generally may not assume that that memory range remains unchanged between two blocks, if you have high-level solidity code between them. So for practical purposes, we don't need to save and restore by our conventions of memory use. If we'd want to simulate the opcode for general use, we'd have to, but given that that's more expensive due to saving and restoring in almost all cases, I'd not actually define a simulated instruction, but rather explicitly use scratch space for this.
I understand, but no one wants the following code to break unexpectedly:
mstore(0, typehash)
mstore(0x20, codeload(offset))
let hash := keccak256(0, 0x40)
Sure - which is why I wouldn't define a simulated codeload
opcode. It'd be nice if we had it on the EVM level, but it's not an issue to live without it and manually copy to scratch space when necessary. But yeah, maybe not too productive to digress into too many details before the base feature of the data location is done.
BTW we could define it as a function when Solidity will support global assembly blocks:
function codeload(offset) -> value {
let tmp := mload(0)
codecopy(0, offset, 0x20)
value := mload(0)
mstore(0, tmp)
}
Abstract
Add native support for dynamic
immutable
arrays, with a length defined at construction time (up to a maximum) and automatic bounds checks.Motivation
immutable
is a fantastic feature, allowing for highly efficient configuration access without forcing developers to use hardcoded constants, making testing and deployment much easier and enjoyable. However, there is zero support for any kind of arrays.What I describe can be manually implemented today in solc >0.7, but it leads to a very large amount of error-prone boilerplate. No alternative exists for gas-efficient access to these values. You can look at production code using this pattern here. Using the proposed features in such a contract would reduce sour code size to less than a third of the original length, while also increasing auditability and maintainability.
Specification
'Dynamic' may be a confusing term. What I mean is that the contents and length would be dynamically defined at construction time. Read access with bounds check using the
[]
operator and.length
would both be supported. This feature set is essentially the same as that of memory arrays, except memory arrays can have their contents mutated.Truly dynamic immutable arrays would be quite an undertaking as the length of the runtime code would be dependent on construction arguments, but I'd argue that going that far is unnecessary. In all applications I've seen, there is a compile-time known maximum array length.
The proposed feature could therefore work as follows:
uint256[] private immutable(50) foo
)foo.length
andfoo[i]
can be used to write (not read) to the array. For simplicity, we could allow for each value to be written multiple times and perform no bounds check (other than.length <= N
andi < N
).foo.length
returns the array's length andfoo[i]
performs bounds check withfoo.length
.Backwards Compatibility
This is a new feature using brand new (as of today invalid) syntax, so there should be no backwards compatibility implications.
Extra
Code samples
As mentioned above, this can be implemented today using standard Solidity. See for example how one might declare and use an up to 10 tokens array:
This is of course very error-prone and hard to maintain, but it is the best way to get the desired behavior. With the suggested feature, we get the exact same bytecode size and performance using the following code: