estarriolvetch / ERC721Psi

MIT License
117 stars 29 forks source link

Zero gas overhead on mint for ERC721PsiRandomSeedReveal #12

Open reasv opened 1 year ago

reasv commented 1 year ago

I've come across your blog posts about ERC721Psi when evaluating different ERC721 contracts to use, and I was impressed with how this works, so I decided to use it.

However I couldn't really accept having gas overhead when minting using ERC721PsiRandomSeedReveal, so I thought about how to remove that.

In this version, I got rid of any actions in (_safeMint()) and consequently also removed ERC721PsiBatchMetaData from the dependencies.

Instead of using the "batch head" from the token mint, I set a genHead for each random seed generation. There's a bitmap _genHead that sets a bit on the first token of each gen just like ERC721PsiBatchMetaData did for the first token of each batch. Then instead of _batchHeadtokenGen we have a mapping genHeadToGenId where the mapping from the tokenId of each genHead to the corresponding reveal genId is stored.

All the changes are done in _reveal() instead of _mint(),: We just set _minted as the head of the next gen, along with the mapping from _minted to the next generation ID.

Generation 0 is set up in the contract constructor.

_tokenGen() works more or less the same as before except finding the gen head instead of the batch head:

// Find next genHead for this token
uint256 genHeadTokenId = _genHead.scanForward(tokenId);
// Get gen from the genHead's tokenId
return genHeadToGenId[genHeadTokenId];

Overall the logic is pretty much the same as before, but substituting the batch head with a single head for each gen, and doing all actions when a new generation starts. If one went through with this system, it would make sense to further simplify it, and get rid of Gen ids and currentGen entirely, directly using the tokenId of the gen head in place of a gen ID, saving a little bit of gas during _reveal() and _tokenGen(), and removing the need for the genHeadToGenId mapping entirely. It would just be gen head token id => seed in genSeed directly.

However there is a good reason to keep a mapping between genHead and gen id for further optimizations that I am going to discuss now.

The problem with this scheme is obvious, it's very fast if we _reveal() often, let's say every 256 nfts generated, because it means _genHead.scanForward(tokenId); in _tokenGen() never has to go through many bit buckets to find the last head.

But if we minted 10000 NFTs and had a single generation, it would have to look back to ID 0 every time, costing at worst (for the last tokens) some ~40 storage reads to get the uint256s all the way back to the beginning (I'm assuming each bucket maps 256 tokens, correct me if I am wrong).

Of course, 40 storage reads is not necessarily catastrophic, and it simply does not matter if seed() and functions using it are only meant to be view functions read off-chain. In any case where there's no need to do any on chain work with the seed it's not an issue, and this version is I think strictly better than the one it is replacing, due to the removal of extra minting fees.

It's also possible to cut the amount of work by simply doing more generations. The more times we call _reveal() in evenly spaced intervals during minting, the better. Having one generation every 1000 NFTs already cuts the maximum storage lookups to an acceptable 4 I believe. And of course the NFTs closer to the previous head need less. So it's 2 or less for half of all NFTs in that case.

Thinking of that property, I considered modifying this optimization.

We can go back to doing what you did in the previous version, and add code in the mint function to set the first minted NFT as a genHead, and set it in the genHeadToGenId mapping:

// Set the first NFT as a genHead for the current gen
_genHead.set(_minted);
genHeadToGenId[_minted] = currentGen;

It doesn't matter that there's already a genHead for this current generation, we can set as many NFTs as heads as we want, and the genHeadToGenId mapping ensures we get the correct generation when trying to find the seed. Of course, if we do this on every mint, we are back where we started.

But there's also no need to do this every mint. The point of setting some NFTs as extra genheads is to reduce the time it takes to look back and find the closest genhead. So we can just space them out. We could do _minted % 256 == 0 to get one genHead in every bucket. But the issue is due to batch minting we would miss some buckets, since the equality might never turn true for the batch head.

But we can just check if any NFT within the batch would have tokenId % 256 == 0 by doing 256 - (_minted % 256 ) < quantity - 1:

    function _mint(
        address to,
        uint256 quantity
    ) internal virtual override {
        if (257 - (_minted % 256 ) < quantity) {
            // Set the first NFT as a genHead for the current gen
            _genHead.set(_minted);
            genHeadToGenId[_minted] = currentGen;
        }
    }

This way only a few mints will cost more. It's a little unfair, but overall gas savings seem huge to me, and I don't see any cost to this approach. We can decide how often it happens by changing 256 into something else, like 1024, and only have 10 mints be more expensive, at the cost of slightly higher gas when trying to find seed(). To be clear, this mint function override is not part of this PR, the PR is only the previous part, without this. As the point was to reduce overhead to 0 when minting. This only makes it 0 most of the time.

What do you think? I haven't tested any of this yet, the idea just came to me yesterday and I thought about sharing it.

For my own project, I am probably going to use a different approach that works better for my specific use case; as much as I like this bitmap-based solution, I will most likely only have one generation, maybe two, but I just want optional support for multiple generations, and intend to have a single reveal at the end. Multiple reveals is just a bonus feature that I might use and probably won't.

So my approach is to simply have a mapping genThreshold[] from Gen ID => first token id of NEXT gen. Whenever I reveal, genThreshold[currentGen] is set to _minted. _tokenGen() just loops from generation 0 to current gen, comparing the tokenId to see if it's < threshold:

function _tokenGen(uint256 tokenId) internal view returns (uint256) {
        require(_exists(tokenId), "ERC721PsiRandomSeedReveal: generation query for nonexistent token");
        // Find the generation the tokenId belongs to by checking it against genThreshold
        for (uint256 i = 0; i < currentGen; ++i) {
            if (tokenId < genThreshold[i]) {
                return i;
            }
        }
        return currentGen;
    }

So, this is O(n) with n being the number of generations we have. But it's by far the cheapest overall method if you only have one or a few generations. There's no bitmap, and there is never any mint overhead. This is pretty unusable if you have many generations, whereas the initial proposal is the opposite, fast with many.

To each use case their own solution I guess.

estarriolvetch commented 1 year ago

@reasv This idea sounds really good to me. I think this is indeed a good approach. Shifting the gas consumption from _mint() to _reveal() should give the users a more pleasant experience. We can also add administrative functions to fill _genHead if needed.

Since this can be a breaking change, I think it will be better if you can implement it as a separate extension.

Also I think the one using genThreshold can also be a separate extension. In practice, I don't think anyone will need more than 256 generations, so using uint8 array for genTHreshold can potentially save you more gas.

With corresponding test cases, I will be happy to merge your PR! Also, looking forward to see what you will build with ERC721Psi!!!

reasv commented 1 year ago

Thanks, I'm glad we're on the same page regarding this. And yes, you're right, they probably should be separate extensions. It seems the mock for tests breaks on this version anyways, I didn't expect to merge the PR in its current form.

I'll look into writing tests for these alternative versions in the next week as soon as I have time. I'll have to write tests for the version I chose to use internally anyways.

using uint8 array for genThreshold can potentially save you more gas. Does it? I'm not sure, in most cases EVM uses 256 bit words internally and uint8 ends up costing a bit more due to checks, the only situation where I'm aware it saves costs is when packing it in an array.

estarriolvetch commented 1 year ago

Awesome!

Regarding using uint8, you are correct evm ues 256 bit words, so the cost will sightly increase if you only have one generation. However, if you have more than 1 generation, you will be benefit from reading from the same storage slot, as uint8 numbers are packed in a compact form.

After the Berlin update, reading the same storage slot in a single tx will cost only 100 more gas.

In my opinion, the downside of using unit8 is negligible, but it can potentially saves a lot if you have a few generations.

estarriolvetch commented 1 year ago

Also, you may consider using fixed array instead of dynamic array to save more.