Enable setting and getting multiple bits in `Bitmaps.sol`

jesperkristensen58 commented 1 year ago

🧐 Motivation

The Bitmaps library can be greatly expanded in its usefulness if setting and getting multiple bits is enabled

This would be helpful, e.g., in an NFT bitmap-based airdrop where we start with 1-bits and switch them to 0 as mints occur. This would also be helpful in the case where a protocol is using bitpacking (e.g., this is directly useful in Panoptic), as in: https://twitter.com/cryptojesperk/status/1613207330782416897?s=20&t=dDcU6hQPJEGzoo8sX8PJ8A

📝 Details These new functions are added via function overloading (the argument list is changed from taking an index to a startIndex and num):

set(BitMap storage bitmap, uint256 startIndex, uint256 num)
get(BitMap storage bitmap, uint256 startIndex, uint256 num)
setTo(BitMap storage bitmap, uint256 startIndex, uint256 num, bool value)
unset(BitMap storage bitmap, uint256 startIndex, uint256 num)

Where "startIndex" is the starting index of the first bit to interact with. "num" is the number of bits to interact with (set or unset) from the startIndex onwards.

Consider the following 256 bits (32 bytes, or 1 word) example:

exampleBitmap = 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Currently, we can set or unset 1 bit at a time. With these new functions, we can set n bits at once.

For example:

exampleBitmap = exampleBitmap.set(5, 10);

sets the bits starting at index 5 to 15 to 1 - and leaves the rest at 0 - in a single function call:

0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111110000

And we can get those bits back:

exampleBitmap = exampleBitmap.get(5, 10);

Returns the bits:

1111111111

The bits can be returned in various formats to be decided. Potentially as an array of uint256s. An event would be emitted also detailing the start index and the number of bits that are returned.

The difficult parts will be when bits are set across overlapping _data elements, but can be handled, among other complex situations (which can all be handled).

Amxx commented 1 year ago

Hello @jesperkristensen58

While I like the general idea, I'm uncomfortable with unbounded operation. If num is big, than the gas cost might start being an issue. Maybe num should be a uint8, limiting the size to 255, which would fit in a bytes32 (and would require read/write to at most 2 storage slots).

Additionally, this library is just a data structure, it should be fully transparent to the devs using it, so I don't think emitting an event is acceptable. An analogy would be if a standard C++ structure such as std::vector did a printf to the console. Nobody wants that.

Is this issue just theoretical, or do you have a real usecase motivating it ? I'd love to know more about how it would be used.

jesperkristensen58 commented 1 year ago

Thanks for your quick reply, Hadrien!

Maybe num should be a uint8, limiting the size to 255

Your suggestion to cut the complexity of the proposal down to at least being able to manage two words while having the capacity to set multiple bits is an excellent one. The price of gas is an important factor. And this will make it possible to use the uint8 as well of course.

so I don't think emitting an event is acceptable

I concur with you completely. Let's eliminate the event specific. If necessary, any further information could be included in the return data.

Is this issue just theoretical, or do you have a real usecase motivating it ?

Current Solution

Yes, we are essentially utilizing bitpacking in the development of Panoptic - it's not theoretical, but highly practical and will, in fact, be implemented in our DeFi Options protocol in production: Please see http://www.panoptic.xyz/ for context.

This tweet demonstrates how this is now achieved (but we have our own implementation): https://twitter.com/cryptojesperk/status/1613207330782416897

So, in short, we need a way to set and get n bits in a word starting at an arbitrary index. This allows us to pack variables of any number of bits easily. Having this be available for a single word makes sense.

(In fact, in our case we'd always work within the first word, aka bucket with index 0: _data[0]; but that's just a detail, in general it could be helpful to have the potential _data[0] to _data[1] crossing happen if needed so I like your idea).

Bitmaps used as a Current Solution

To be sure, we can solve the problem using the current library, but it becomes cumbersome and inefficient since we need a for loop.

Best Solution: Generalize Bitmaps.sol

The solution here could use a simple (generalized) mask but with the complexity of correctly handling "bucket crossings."

Amxx commented 1 year ago

For the record, this may be a possible (untested) implementation:

    function mkmask(uint8 leftZeros, uint8 rightZeros) internal pure returns (uint256) {
        return type(uint256).max << leftZeros >> leftZeros >> rightZeros << rightZeros;
    }

    function setTo(uint256 index, uint8 length, bool value) internal {
        if (length == 0) {
            return;
        }

        uint256 bucket = index / 256;
        uint256 start  = index % 256;
        uint256 end    = start + length;

        if (end <= 256) {
            uint8 leftZero = uint8(start);
            uint8 rightZero = uint8(256 - end);
            setBucketTo(bucket, mkmask(leftZero, rightZero), value);
        } else {
            uint8 leftZero1 = uint8(start);
            uint8 rightZero1 = uint8(0);
            setBucketTo(bucket, mkmask(leftZero1, rightZero1), value);

            uint8 leftZero2 = uint8(0);
            uint8 rightZero2 = uint8(512 - end);
            setBucketTo(bucket + 1, mkmask(leftZero2, rightZero2), value);
        }
    }

    function setBucketTo(uint256 bucket, uint256 mask, bool value) internal {
        if (value) {
            setBucket(bucket, mask);
        } else {
            unsetBucket(bucket, mask);
        }
    }

    function setBucket(uint256 bucket, uint256 mask) internal {
        _data[bucket] |= mask;
    }

    function unsetBucket(uint256 bucket, uint256 mask) internal {
        _data[bucket] &= ~mask;
    }

jesperkristensen58 commented 1 year ago

Yes, this would work (I will test as well). Creating a PR.

How do you prefer the get method to work?

Do we return a uint256 where we shift the bits to the start of this uint? And then it's up to the user to know that these are the bits from start to end?

Example: (this is not a full bucket, but just to get the point across)

00000001111110000000 Here the first 1 is at index 7

The user calls bitmap.get(7, 7) and wants this chunk (boundaries highlighted with "-"'s):

000000 - 0111111 - 0000000

So we can return it as:

1) a single uint256: 00000000000...0000 - 0111111

the user then will need to understand that this is from index 7 to 13 in the bitmap - this will be made clear in the documentation. Of course they know what they called - they provided index and length.

This would of course still be a single uint256 even if a bucket crossing happens.

2) at most two uint256's where the bits are positioned like in the bitmap's bucket(s):

000000 - 0111111 - 0000000

and the user still needs to know that indices 7 to 13 are of interest. Two uint256's are needed for bucket crossings (and could be returned in general where the second uint256 is just zero in case no crossing happens).

3) either bullet above but in addition, returning explicitly the start and end indexes as well to be sure (however, the user called with index and length so already has this info).

Either way, the documentation will of course be made clear on exactly what is returned, and examples should be provided in the docstrings.

jesperkristensen58 commented 1 year ago

Still untested, but without the get(...) function pending your answer, here is what I got - thanks for your example:

// SPDX-License-Identifier: MIT
// OpenZeppelin Contracts (last updated v4.8.0) (utils/structs/BitMaps.sol)
pragma solidity ^0.8.0;

/**
 * @dev Library for managing uint256 to bool mapping in a compact and efficient way, providing the keys are sequential.
 * Largely inspired by Uniswap's https://github.com/Uniswap/merkle-distributor/blob/master/contracts/MerkleDistributor.sol[merkle-distributor].
 */
library BitMaps {
    struct BitMap {
        mapping(uint256 => uint256) _data; // _data[bucket]
    }

    /**
     * @dev Returns whether the bit at `index` is set.
     */
    function get(BitMap storage bitmap, uint256 index) internal view returns (bool) {
        uint256 bucket = index >> 8;
        uint256 mask = 1 << (index & 0xff);
        return bitmap._data[bucket] & mask != 0;
    }

    /**
     * @dev Sets the bit at `index` to the boolean `value`.
     */
    function setTo(BitMap storage bitmap, uint256 index, bool value) internal {
        if (value) {
            set(bitmap, index);
        } else {
            unset(bitmap, index);
        }
    }

    /**
     * @dev Sets the bit at `index`.
     */
    function set(BitMap storage bitmap, uint256 index) internal {
        uint256 bucket = index >> 8;
        uint256 mask = 1 << (index & 0xff);
        bitmap._data[bucket] |= mask;
    }

    /**
     * @dev Unsets the bit at `index`.
     */
    function unset(BitMap storage bitmap, uint256 index) internal {
        uint256 bucket = index >> 8;
        uint256 mask = 1 << (index & 0xff);
        bitmap._data[bucket] &= ~mask;
    }

    /**
     * @dev Sets `length` bits starting at `index` to the boolean `value`.
     */
    function setTo(BitMap storage bitmap, uint256 index, uint8 length, bool value) internal {
        if (length == 0) {
            return;
        }

        if (value) {
            set(bitmap, index, length);
        } else {
            unset(bitmap, index, length);
        }
    }

    /**
     * @dev Sets `length` bits starting at `index`.
     */
    function set(BitMap storage bitmap, uint256 index, uint8 length) internal {
        if (length == 0) {
            return;
        }

        uint256 bucket = index / 256;
        uint256 start = index % 256;
        uint256 end = start + length;

        if (end <= 256) {
            uint8 leftZeroes = uint8(start);
            uint8 rightZeroes = uint8(256 - end);
            uint256 mask = mkmask(leftZeroes, rightZeroes);
            setBucket(bitmap, bucket, mask);
        } else {
            uint8 leftZeroes1 = uint8(start);
            uint8 rightZeroes1 = uint8(0);
            uint256 mask1 = mkmask(leftZeroes1, rightZeroes1);
            setBucket(bitmap, bucket, mask1);

            uint8 leftZeroes2 = uint8(0);
            uint8 rightZeroes2 = uint8(512 - end);
            uint256 mask2 = mkmask(leftZeroes2, rightZeroes2);
            setBucket(bitmap, bucket + 1, mask2);
        }
    }

    /**
     * @dev Unsets `length` bits starting at `index`.
     */
    function unset(BitMap storage bitmap, uint256 index, uint8 length) internal {
        if (length == 0) {
            return;
        }

        uint256 bucket = index / 256;
        uint256 start = index % 256;
        uint256 end = start + length;

        if (end <= 256) {
            uint8 leftZeroes = uint8(start);
            uint8 rightZeroes = uint8(256 - end);
            uint256 mask = mkmask(leftZeroes, rightZeroes);
            unsetBucket(bitmap, bucket, mask);
        } else {
            uint8 leftZeroes1 = uint8(start);
            uint8 rightZeroes1 = uint8(0);
            uint256 mask1 = mkmask(leftZeroes1, rightZeroes1);
            unsetBucket(bitmap, bucket, mask1);

            uint8 leftZeroes2 = uint8(0);
            uint8 rightZeroes2 = uint8(512 - end);
            uint256 mask2 = mkmask(leftZeroes2, rightZeroes2);
            unsetBucket(bitmap, bucket + 1, mask2);
        }
    }

    /**
     * @dev Sets a `bucket` in the bitmap under the `mask`.
     */
    function setBucket(BitMap storage bitmap, uint256 bucket, uint256 mask) internal {
        bitmap._data[bucket] |= mask;
    }

    /**
     * @dev Unsets a `bucket` in the bitmap under the `mask`.
     */
    function unsetBucket(BitMap storage bitmap, uint256 bucket, uint256 mask) internal {
        bitmap._data[bucket] &= ~mask;
    }

    /**
     * @dev Make a bitmask with `leftZeros` number of left zero-bits and `rightZeros` number of right zero-bits.
     */
    function mkmask(uint8 leftZeros, uint8 rightZeros) internal pure returns (uint256) {
        return type(uint256).max << leftZeros >> leftZeros >> rightZeros << rightZeros;
    }
}

Amxx commented 1 year ago

Creating a PR.

Please don't. We would still need some internal discussion about this before pulling the trigger on that.

jesperkristensen58 commented 1 year ago

Creating a PR.

Please don't. We would still need some internal discussion about this before pulling the trigger on that.

No problem.

Another practical application is Bitmap-based NFT airdrops, in which a group of bits (say 5,000 - it would be the collection size) are initially set and subsequently unset when users mint the NFT. As numerous buckets are required to first set the bits, this new code would also be beneficial.

Amxx commented 1 year ago

Note that because of the limitations of uint8, the set(BitMap storage bitmap, uint256 index, uint8 length) and unset functions are limited to a length of 255. This doesn't allow setting an entire bucket at once (a bucket has 256 items)

jesperkristensen58 commented 1 year ago

It would be nice to be able to set a full bucket.

We can either create a special function for this, or increase the uint8 for that reason.

Amxx commented 1 year ago

If we increase the uint8, the next type is uint16... but then if someone provides the max value for this type, that implies updating 256 buckets, which is not ok.

I'm not a big fan of doing require(length <= 256)

jesperkristensen58 commented 1 year ago

so can we do start index and end index instead?

Amxx commented 1 year ago

Start index must be uint256. I guess end index also needs to be uint256. Then we would need to check that startIndex < endIndex <= startIndex + 256

not sure that is any better

jesperkristensen58 commented 1 year ago

True, I actually in that case like the uint16 better, but overall does not get rid of the check. We already check for length == 0 though ...

Amxx commented 1 year ago

length == 0 is a no-op, not an error. We can return without doing anything and that is valid.

require(length <= 256) is different because it reverts, which is way worst. Even with documentation, there is always someone that calls the function with a parameter that is uint16 ... and the contract will work for a while until the value becomes too big. At that point the library reverts, and that could completely block the contract ... resulting in a potential loss of funds.

jesperkristensen58 commented 1 year ago

i was thinking of no-opping on <= 256 too.

but if no, we could:

1) write another function to set the full bucket

or

2) introduce a new variable, maybe a bool, saying "setAllBits" or something like that.

jesperkristensen58 commented 1 year ago

Or we redefine what "0" means in "length" (potentially renaming "length")

frangio commented 1 year ago

To summarize, this is a proposal to add a function setTo(BitMap storage bitmap, uint256 index, uint8 length, bool value) that will set length bits starting at index to 1 or 0 according to value.

I don't think it's a big problem if length=256 can't be expressed.

jesperkristensen58 commented 1 year ago

Thanks, @frangio.

It would be valuable having a way to set a bucket entirely to 1 or 0 (due to the airdrop example mentioned, and to make it less strange). Are you saying that we can still do this even with uint8? Or that we shouldn't support this?

Currently:

bitmap.setTo(0, 256, 1)

Would set all the bucket's bits to 1 but can't be done now.

Unless we let:

bitmap.setTo(0, 0, 1)

mean "set the bit in the first index to 1".

Then this would set all bits, and can be done: bitmap.setTo(0, 255, 1).

What are your thoughts on this?

frangio commented 1 year ago

Unless we let: bitmap.setTo(0, 0, 1) mean "set the bit in the first index to 1".

This is not intuitive and people might not read the docs well enough.

This would be helpful, e.g., in an NFT bitmap-based airdrop where we start with 1-bits and switch them to 0 as mints occur.

Please note that this is not good justification... Such a contract should start with 0-bits because it's the default value.

jesperkristensen58 commented 1 year ago

This is not intuitive and people might not read the docs well enough.

I agree. I do like the uint16 the best and perhaps no-op anything >256. seems cleanest and simplest.

Please note that this is not good justification... Such a contract should start with 0-bits because it's the default value.

The reason for starting at 1 and converting to 0 is due to gas savings. Setting non-zero to zero is the least costly and will save users lots of money. There are many use-cases like this.

frangio commented 1 year ago

Ok so what you're saying is that as the deployer you would set all to 1 so that users don't need to pay the cost of changing zero to non-zero? That's interesting.

What I'm starting to consider is that perhaps we should just allow length > 256 and alert people that using an unbounded length will consume unbounded gas. Do we have other cases of unbounded execution in the library currently?

jesperkristensen58 commented 1 year ago

Ok so what you're saying is that as the deployer you would set all to 1 so that users don't need to pay the cost of changing zero to non-zero? That's interesting.

Exactly.

What I'm starting to consider is that perhaps we should just allow length > 256 and alert people that using an unbounded length will consume unbounded gas.

I'm fully onboard with this.

Do we have other cases of unbounded execution in the library currently?

This was from a quick glance: https://github.com/OpenZeppelin/openzeppelin-contracts/blob/master/contracts/utils/structs/EnumerableSet.sol#L216

jesperkristensen58 commented 1 year ago

@frangio

frangio commented 1 year ago

Sorry @jesperkristensen58 this will not make it in the next release.

jesperkristensen58 commented 1 year ago

No problem; thanks for the update

OpenZeppelin / openzeppelin-contracts

Enable setting and getting multiple bits in `Bitmaps.sol` #3963