Open jesperkristensen58 opened 1 year ago
Hello @jesperkristensen58
While I like the general idea, I'm uncomfortable with unbounded operation. If num
is big, than the gas cost might start being an issue. Maybe num
should be a uint8, limiting the size to 255, which would fit in a bytes32 (and would require read/write to at most 2 storage slots).
Additionally, this library is just a data structure, it should be fully transparent to the devs using it, so I don't think emitting an event is acceptable. An analogy would be if a standard C++ structure such as std::vector
did a printf
to the console. Nobody wants that.
Is this issue just theoretical, or do you have a real usecase motivating it ? I'd love to know more about how it would be used.
Thanks for your quick reply, Hadrien!
Maybe num should be a uint8, limiting the size to 255
Your suggestion to cut the complexity of the proposal down to at least being able to manage two words while having the capacity to set multiple bits is an excellent one. The price of gas is an important factor. And this will make it possible to use the uint8
as well of course.
so I don't think emitting an event is acceptable
I concur with you completely. Let's eliminate the event
specific. If necessary, any further information could be included in the return data.
Is this issue just theoretical, or do you have a real usecase motivating it ?
Current Solution
Yes, we are essentially utilizing bitpacking in the development of Panoptic - it's not theoretical, but highly practical and will, in fact, be implemented in our DeFi Options protocol in production: Please see http://www.panoptic.xyz/ for context.
This tweet demonstrates how this is now achieved (but we have our own implementation): https://twitter.com/cryptojesperk/status/1613207330782416897
So, in short, we need a way to set and get n
bits in a word starting at an arbitrary index. This allows us to pack variables of any number of bits easily. Having this be available for a single word makes sense.
(In fact, in our case we'd always work within the first word, aka bucket with index 0: _data[0]; but that's just a detail, in general it could be helpful to have the potential _data[0] to _data[1] crossing happen if needed so I like your idea).
Bitmaps used as a Current Solution
To be sure, we can solve the problem using the current library, but it becomes cumbersome and inefficient since we need a for loop.
Best Solution: Generalize Bitmaps.sol
The solution here could use a simple (generalized) mask but with the complexity of correctly handling "bucket crossings."
For the record, this may be a possible (untested) implementation:
function mkmask(uint8 leftZeros, uint8 rightZeros) internal pure returns (uint256) {
return type(uint256).max << leftZeros >> leftZeros >> rightZeros << rightZeros;
}
function setTo(uint256 index, uint8 length, bool value) internal {
if (length == 0) {
return;
}
uint256 bucket = index / 256;
uint256 start = index % 256;
uint256 end = start + length;
if (end <= 256) {
uint8 leftZero = uint8(start);
uint8 rightZero = uint8(256 - end);
setBucketTo(bucket, mkmask(leftZero, rightZero), value);
} else {
uint8 leftZero1 = uint8(start);
uint8 rightZero1 = uint8(0);
setBucketTo(bucket, mkmask(leftZero1, rightZero1), value);
uint8 leftZero2 = uint8(0);
uint8 rightZero2 = uint8(512 - end);
setBucketTo(bucket + 1, mkmask(leftZero2, rightZero2), value);
}
}
function setBucketTo(uint256 bucket, uint256 mask, bool value) internal {
if (value) {
setBucket(bucket, mask);
} else {
unsetBucket(bucket, mask);
}
}
function setBucket(uint256 bucket, uint256 mask) internal {
_data[bucket] |= mask;
}
function unsetBucket(uint256 bucket, uint256 mask) internal {
_data[bucket] &= ~mask;
}
Yes, this would work (I will test as well). Creating a PR.
get
method to work?Do we return a uint256 where we shift the bits to the start of this uint? And then it's up to the user to know that these are the bits from start
to end
?
Example: (this is not a full bucket, but just to get the point across)
00000001111110000000
Here the first 1 is at index 7
The user calls bitmap.get(7, 7)
and wants this chunk (boundaries highlighted with "-"'s):
000000 - 0111111 - 0000000
So we can return it as:
1) a single uint256: 00000000000...0000 - 0111111
the user then will need to understand that this is from index 7 to 13 in the bitmap - this will be made clear in the documentation. Of course they know what they called - they provided index
and length
.
This would of course still be a single uint256 even if a bucket crossing happens.
2) at most two uint256's where the bits are positioned like in the bitmap's bucket(s):
000000 - 0111111 - 0000000
and the user still needs to know that indices 7 to 13 are of interest. Two uint256's are needed for bucket crossings (and could be returned in general where the second uint256 is just zero in case no crossing happens).
3) either bullet above but in addition, returning explicitly the start
and end
indexes as well to be sure (however, the user called with index
and length
so already has this info).
Either way, the documentation will of course be made clear on exactly what is returned, and examples should be provided in the docstrings.
Still untested, but without the get(...)
function pending your answer, here is what I got - thanks for your example:
// SPDX-License-Identifier: MIT
// OpenZeppelin Contracts (last updated v4.8.0) (utils/structs/BitMaps.sol)
pragma solidity ^0.8.0;
/**
* @dev Library for managing uint256 to bool mapping in a compact and efficient way, providing the keys are sequential.
* Largely inspired by Uniswap's https://github.com/Uniswap/merkle-distributor/blob/master/contracts/MerkleDistributor.sol[merkle-distributor].
*/
library BitMaps {
struct BitMap {
mapping(uint256 => uint256) _data; // _data[bucket]
}
/**
* @dev Returns whether the bit at `index` is set.
*/
function get(BitMap storage bitmap, uint256 index) internal view returns (bool) {
uint256 bucket = index >> 8;
uint256 mask = 1 << (index & 0xff);
return bitmap._data[bucket] & mask != 0;
}
/**
* @dev Sets the bit at `index` to the boolean `value`.
*/
function setTo(BitMap storage bitmap, uint256 index, bool value) internal {
if (value) {
set(bitmap, index);
} else {
unset(bitmap, index);
}
}
/**
* @dev Sets the bit at `index`.
*/
function set(BitMap storage bitmap, uint256 index) internal {
uint256 bucket = index >> 8;
uint256 mask = 1 << (index & 0xff);
bitmap._data[bucket] |= mask;
}
/**
* @dev Unsets the bit at `index`.
*/
function unset(BitMap storage bitmap, uint256 index) internal {
uint256 bucket = index >> 8;
uint256 mask = 1 << (index & 0xff);
bitmap._data[bucket] &= ~mask;
}
/**
* @dev Sets `length` bits starting at `index` to the boolean `value`.
*/
function setTo(BitMap storage bitmap, uint256 index, uint8 length, bool value) internal {
if (length == 0) {
return;
}
if (value) {
set(bitmap, index, length);
} else {
unset(bitmap, index, length);
}
}
/**
* @dev Sets `length` bits starting at `index`.
*/
function set(BitMap storage bitmap, uint256 index, uint8 length) internal {
if (length == 0) {
return;
}
uint256 bucket = index / 256;
uint256 start = index % 256;
uint256 end = start + length;
if (end <= 256) {
uint8 leftZeroes = uint8(start);
uint8 rightZeroes = uint8(256 - end);
uint256 mask = mkmask(leftZeroes, rightZeroes);
setBucket(bitmap, bucket, mask);
} else {
uint8 leftZeroes1 = uint8(start);
uint8 rightZeroes1 = uint8(0);
uint256 mask1 = mkmask(leftZeroes1, rightZeroes1);
setBucket(bitmap, bucket, mask1);
uint8 leftZeroes2 = uint8(0);
uint8 rightZeroes2 = uint8(512 - end);
uint256 mask2 = mkmask(leftZeroes2, rightZeroes2);
setBucket(bitmap, bucket + 1, mask2);
}
}
/**
* @dev Unsets `length` bits starting at `index`.
*/
function unset(BitMap storage bitmap, uint256 index, uint8 length) internal {
if (length == 0) {
return;
}
uint256 bucket = index / 256;
uint256 start = index % 256;
uint256 end = start + length;
if (end <= 256) {
uint8 leftZeroes = uint8(start);
uint8 rightZeroes = uint8(256 - end);
uint256 mask = mkmask(leftZeroes, rightZeroes);
unsetBucket(bitmap, bucket, mask);
} else {
uint8 leftZeroes1 = uint8(start);
uint8 rightZeroes1 = uint8(0);
uint256 mask1 = mkmask(leftZeroes1, rightZeroes1);
unsetBucket(bitmap, bucket, mask1);
uint8 leftZeroes2 = uint8(0);
uint8 rightZeroes2 = uint8(512 - end);
uint256 mask2 = mkmask(leftZeroes2, rightZeroes2);
unsetBucket(bitmap, bucket + 1, mask2);
}
}
/**
* @dev Sets a `bucket` in the bitmap under the `mask`.
*/
function setBucket(BitMap storage bitmap, uint256 bucket, uint256 mask) internal {
bitmap._data[bucket] |= mask;
}
/**
* @dev Unsets a `bucket` in the bitmap under the `mask`.
*/
function unsetBucket(BitMap storage bitmap, uint256 bucket, uint256 mask) internal {
bitmap._data[bucket] &= ~mask;
}
/**
* @dev Make a bitmask with `leftZeros` number of left zero-bits and `rightZeros` number of right zero-bits.
*/
function mkmask(uint8 leftZeros, uint8 rightZeros) internal pure returns (uint256) {
return type(uint256).max << leftZeros >> leftZeros >> rightZeros << rightZeros;
}
}
Creating a PR.
Please don't. We would still need some internal discussion about this before pulling the trigger on that.
Creating a PR.
Please don't. We would still need some internal discussion about this before pulling the trigger on that.
No problem.
Another practical application is Bitmap-based NFT airdrops, in which a group of bits (say 5,000 - it would be the collection size) are initially set and subsequently unset when users mint the NFT. As numerous buckets are required to first set the bits, this new code would also be beneficial.
Note that because of the limitations of uint8, the set(BitMap storage bitmap, uint256 index, uint8 length)
and unset
functions are limited to a length of 255. This doesn't allow setting an entire bucket at once (a bucket has 256 items)
It would be nice to be able to set a full bucket.
We can either create a special function for this, or increase the uint8 for that reason.
If we increase the uint8, the next type is uint16... but then if someone provides the max value for this type, that implies updating 256 buckets, which is not ok.
I'm not a big fan of doing require(length <= 256)
so can we do start index and end index instead?
Start index must be uint256. I guess end index also needs to be uint256. Then we would need to check that startIndex < endIndex <= startIndex + 256
not sure that is any better
True, I actually in that case like the uint16 better, but overall does not get rid of the check. We already check for length == 0 though ...
length == 0
is a no-op, not an error. We can return without doing anything and that is valid.
require(length <= 256)
is different because it reverts, which is way worst. Even with documentation, there is always someone that calls the function with a parameter that is uint16 ... and the contract will work for a while until the value becomes too big. At that point the library reverts, and that could completely block the contract ... resulting in a potential loss of funds.
i was thinking of no-opping on <= 256 too.
but if no, we could:
1) write another function to set the full bucket
or
2) introduce a new variable, maybe a bool, saying "setAllBits" or something like that.
Or we redefine what "0" means in "length" (potentially renaming "length")
To summarize, this is a proposal to add a function setTo(BitMap storage bitmap, uint256 index, uint8 length, bool value)
that will set length
bits starting at index
to 1 or 0 according to value
.
I don't think it's a big problem if length=256 can't be expressed.
Thanks, @frangio.
It would be valuable having a way to set a bucket entirely to 1 or 0 (due to the airdrop example mentioned, and to make it less strange). Are you saying that we can still do this even with uint8? Or that we shouldn't support this?
Currently:
bitmap.setTo(0, 256, 1)
Would set all the bucket's bits to 1 but can't be done now.
Unless we let:
bitmap.setTo(0, 0, 1)
mean "set the bit in the first index to 1".
Then this would set all bits, and can be done: bitmap.setTo(0, 255, 1)
.
What are your thoughts on this?
Unless we let:
bitmap.setTo(0, 0, 1)
mean "set the bit in the first index to 1".
This is not intuitive and people might not read the docs well enough.
This would be helpful, e.g., in an NFT bitmap-based airdrop where we start with 1-bits and switch them to 0 as mints occur.
Please note that this is not good justification... Such a contract should start with 0-bits because it's the default value.
This is not intuitive and people might not read the docs well enough.
I agree. I do like the uint16 the best and perhaps no-op anything >256. seems cleanest and simplest.
Please note that this is not good justification... Such a contract should start with 0-bits because it's the default value.
The reason for starting at 1 and converting to 0 is due to gas savings. Setting non-zero to zero is the least costly and will save users lots of money. There are many use-cases like this.
Ok so what you're saying is that as the deployer you would set all to 1 so that users don't need to pay the cost of changing zero to non-zero? That's interesting.
What I'm starting to consider is that perhaps we should just allow length > 256 and alert people that using an unbounded length will consume unbounded gas. Do we have other cases of unbounded execution in the library currently?
Ok so what you're saying is that as the deployer you would set all to 1 so that users don't need to pay the cost of changing zero to non-zero? That's interesting.
Exactly.
What I'm starting to consider is that perhaps we should just allow length > 256 and alert people that using an unbounded length will consume unbounded gas.
I'm fully onboard with this.
Do we have other cases of unbounded execution in the library currently?
This was from a quick glance: https://github.com/OpenZeppelin/openzeppelin-contracts/blob/master/contracts/utils/structs/EnumerableSet.sol#L216
@frangio
Sorry @jesperkristensen58 this will not make it in the next release.
No problem; thanks for the update
š§ Motivation
This would be helpful, e.g., in an NFT bitmap-based airdrop where we start with 1-bits and switch them to 0 as mints occur. This would also be helpful in the case where a protocol is using bitpacking (e.g., this is directly useful in Panoptic), as in: https://twitter.com/cryptojesperk/status/1613207330782416897?s=20&t=dDcU6hQPJEGzoo8sX8PJ8A
š Details These new functions are added via function overloading (the argument list is changed from taking an
index
to astartIndex
andnum
):Where "startIndex" is the starting index of the first bit to interact with. "num" is the number of bits to interact with (
set
orunset
) from the startIndex onwards.Consider the following 256 bits (32 bytes, or 1 word) example:
Currently, we can set or unset 1 bit at a time. With these new functions, we can set n bits at once.
For example:
sets the bits starting at index 5 to 15 to 1 - and leaves the rest at 0 - in a single function call:
And we can get those bits back:
Returns the bits:
The bits can be returned in various formats to be decided. Potentially as an array of uint256s. An event would be emitted also detailing the start index and the number of bits that are returned.
The difficult parts will be when bits are set across overlapping _data elements, but can be handled, among other complex situations (which can all be handled).