OpenZeppelin / openzeppelin-contracts

OpenZeppelin Contracts is a library for secure smart contract development.
https://openzeppelin.com/contracts
MIT License
24.96k stars 11.79k forks source link

Add a function for shrinking memory arrays #3491

Closed CodeSandwich closed 2 years ago

CodeSandwich commented 2 years ago

🧐 Motivation Solidity in-memory arrays can't grow, because reallocation would be expensive and difficult to perform safely. But they can safely shrink.

Quite often a memory array needs to be passed around with a length variable to simulate a dynamic data structure. In some cases it's only ever decreasing, e.g. when items are popped and consumed. This is both inefficient and dangerous, because Solidity only checks array.length when sanity-checking accessed indexes, it doesn't know about the other variable.

📝 Details Add a function to library Arrays:

function shrinkArray(uint256[] memory array, uint newLength) internal pure returns (uint256[] memory) {
    require(newLength <= array.length, "Array: length after shrinking larger than before");
    /// @solidity memory-safe-assembly
    assembly {
        mstore(array, newLength)
    }
    return array;
}

Edit: I think that shrinkArray is too verbose, just shrink would be a better name, it'd be called by Array.shrink(myArray, len); or when using Array for uint256[], just by myArray.shrink(len).

Amxx commented 2 years ago

Hello @CodeSandwich

Do you have any concrete examples of functions/usecases that need this feature?

CodeSandwich commented 2 years ago

Sure!

Take a look here: https://github.com/radicle-dev/drips-contracts/blob/189d4bfefa633733576303a8d01b58e3ea8f412e/src/Drips.sol#L363 We're consuming items from a memory array. The array is allocated to hold a certain number of items, but usually it's not entirely filled, so the consuming function needs a length parameter. With shrinkArray we could shrink out the unused entries and rely on the array's .length instead of a separate variable.

Another example from the same contract: https://github.com/radicle-dev/drips-contracts/blob/189d4bfefa633733576303a8d01b58e3ea8f412e/src/Drips.sol#L445 We're iterating 2 memory arrays, currReceivers and newReceivers, in a lockstep. They're sorted and we're trying to find matching items from each of the lists to consume them together or separately if there's no match. That requires maintaining 2 variables: currIdx and newIdx, which act as iterators, they tell us which item from each of the arrays is being consumed. Each of them must be checked against the length of the respective array to prevent out of bonds reads. This function is also constantly fighting the stack depth, there are too many items. With shrinkArray iteration could be done in reverse and these variables could be dropped entirely, all we'd need to do is check if .length == 0. It'd be also impossible to accidentally ready an already consumed item.

I know that they are NOT working on uint256s, but with Solidity's current lack of generics I find myself sometimes copy-pasting code from OZ and updating the code for my types. That'd be a good template even if types wouldn't match.

CodeSandwich commented 2 years ago

When I'm thinking about it, a function popArray could be s very useful tool built on top :thinking:

function popArray(uint256[] memory array) internal pure returns (uint256 value) {
    require(array.length > 0, "Array: zero length arrays can't be popped");
    uint256 newLen = array.length - 1;
    value = array[newLen];
    shrinkArray(array, newLen);
}
Amxx commented 2 years ago

The array is allocated to hold a certain number of items, but usually it's not entirely filled, so the consuming function needs a length parameter. With shrinkArray we could shrink out the unused entries and rely on the array's .length instead of a separate variable.

Once shrunk, you would not now it it can be expanded back though :/ Also, storing the length in a variable is possibly cheaper than having to mstore it and mload it. Its definitely a design decision, but IMO having a length param is not a bad thing.

Amxx commented 2 years ago

When I'm thinking about it, a function popArray could be s very useful tool built on top thinking

function popArray(uint256[] memory array) internal pure returns (uint256 value) {
    require(array.length > 0, "Array: zero length arrays can't be popped");
    uint256 newLen = array.length - 1;
    value = array[newLen];
    shrinkArray(array, newLen);
}

But then you can't push ... and you are losing track of memory pretty fast :/

Amxx commented 2 years ago

I'm honestly not comfortable with this idea of using the array length as a solution to avoid an "iterator" variable, mostly because it loses data that can't be recovered, and possibly because it might be more expensive (we should benchmark that).

I'm curious what @frangio thinks.

CodeSandwich commented 2 years ago

I wrote a tiny benchmark of shrinkArray and the result is 82 gas:

uint256[] memory array = new uint256[](123);
uint gas = gasleft();
shrinkArray(array, 122);
gas -= gasleft();
Amxx commented 2 years ago

That sounds like is a lot compared to updating an iterator.

CodeSandwich commented 2 years ago

Iteration is not the only use case, any place where an array is being drained or built in an excessively allocated region can benefit from shrinking and prevent out of bounds access.

But ok, let's go with the iteration use case. I've optimized the proposed popArray, it's still as safe to use as regular array access:

function popArray(uint256[] memory array) internal pure returns (uint256 value) {
    uint256 length = array.length;
    require(length > 0, "Array: zero length arrays can't be popped");
    /// @solidity memory-safe-assembly
    assembly {
        value := mload(add(array, mul(length, 32)))
        mstore(array, sub(length, 1))
    }
}

Here's the benchmark, array :

uint256[] memory array = new uint256[](100);

uint gas1 = gasleft();
for(uint i = 0; i < array.length; i++) {
    sum1 += array[i];
}
gas1 -= gasleft();

uint256 sum2;
uint gas2 = gasleft();
while(array.length > 0) {
    sum2 += popArray(array);
}
gas2 -= gasleft();

The results are:

Without mload in Yul but with regular value = array[length-1] popping is 25831 gas, still very much comparable with 14 gas more per iteration.

I don't want to sound like a maniac, it's not like my life depends on this feature. If you don't want to add it, then you won't and I don't have any saying in this matter. The only thing I'm arguing with here are the specific statements.

Amxx commented 2 years ago

You are bechmarking a lot of extra stuff

        uint256[] memory array = new uint256[](100);
        uint256 sum1;
        uint256 sum2;
        uint256 sum3;

        uint gas1 = gasleft();
        for(uint i = 0; i < array.length; i++) {
            sum1 += array[i];
        }
        gas1 -= gasleft();

        uint gas2 = gasleft();
        unchecked {
            uint length = array.length;
            for(uint i = 0; i < length; i++) {
                sum2 += array[i];
            }
        }
        gas2 -= gasleft();

        uint gas3 = gasleft();
        while(array.length > 0) {
            sum3 += popArray(array);
        }
        gas3 -= gasleft();

With my compiler settings I get:


The basic iteration mload the length for every loop (and so does the popping) ... that cost a lot. The basic iteration does safe math on i++ and bound checks on array[i] ... that also cost a lot. there two account for more than half of the total cost.

frangio commented 2 years ago

I don't think this function is a good idea. It's not like shrinking the array will free up memory. For iteration I'm pretty sure using a variable in the stack will be more efficient. And for use cases that relate to allocating an array that is too large and then shrinking it to size, I think that's pretty advanced and it can be written using assembly. I'm also reluctant to manipulate memory in this way, I'd rather people did it explicitly using assembly so that they can be more aware of the risks.

@Amxx Note that your benchmark is also using unchecked for sum2 += array[i] in your second case. This is messing up the comparison, it's not what we want to measure.

CodeSandwich commented 2 years ago

That's fair, thank you for looking into this.

What risks are you seeing? My main goal was to fool-proof non-full arrays, do you see a way for a user to harm themselves by using a function like this?

frangio commented 2 years ago

I'm concerned about the interaction with Solidity optimizations. I think that as long as the assembly block isn't marked "memory-safe" things should be fine, but there is some risk... See the recent compiler bug related to memory and assembly.

rgenito commented 4 months ago

For what it is worth, shrinking the array can be useful when you are building an array of NFT IDs. Let's suppose the NFT ID 0 is not allowed, and it simply means, "nothing was minted". And then you run a batch function to mint a bunch of NFTs, and this function logs an array of all of the NFT IDs that were minted. For every uint256 element of the array, my understanding is that logging each extra byte becomes another 3 gas--and this gas cost is subject to change in the future of EVMs! So for every 0 value NFT in the array that is not included in the logged array, you are saving 96 gas. That can really add up.

And therefore, I find it very much worth it to shrink an array! Example code...

event idLog(address indexed user, uint256[] memory id);

function mintBatch(something) {
    uint256[] memory ids = new uint256[](something.length);

    uint256 offset;
    // iterate though `something` and determine whether or not an NFT is minted (id != 0)
    // however, we will only add the NFT ID to the id list if the ID is NOT 0.
    // [START] begin loop
        if (nftid != 0) {
            // of course, do this unchecked
            ids[offset++] = nftid;
        }
    // [END] finish loop

    if (offset != something.length) {
        shrinkArray(id, offset);
    }

    // may as well NOT log when there are no NFTs minted...
    if (offset != 0) {
        emit idLog(msg.sender, ids);
    }
}