ERC721URIStorage: add internal getter for _tokenURIs

🧐 Motivation This would allow much easier (and cheaper) implementations of an hasTokenURI helper.

📝 Details Currently the ERC721URIStorage._tokenURIs variable is defined as private, which prevents inheriting contracts from directly accessing the data stored in it.
Additionally, when the _baseURI method provides a string but the token itself has no URI set, the ERC721URIStorage.tokenURI method ends up returning the super/ERC721 implementation, where the _baseURI is concatenated with tokenId resulting in a non-empty string. Checking against this return value to know if the URI is already set is unnecessarily costly.

The change in visibility to internal would allow for a significantly simpler implementation of this check, something like:

abstract contract ERC721URIStorage is ERC721 {
    using Strings for uint256;

    // Optional mapping for token URIs
    mapping(uint256 => string) internal _tokenURIs;
...

function _hasTokenURI(uint256 tokenId) internal view returns (bool) {
    return bytes(_tokenURIs[tokenId]).length > 0;
}

Would this raise security concerns?

Alternatives built into ERC721URIStorage.tokenURI could also be considered to avoid this change, and this led me to wonder if the current behaviour is really to be expected of the extension contract.

What I mean is that, being (to my understanding) the extension based on the premise that the user wants to add an extra layer of abstraction on top of the base ERC721.tokenURI functionality, returning the super method might not be ideal. Instead, the user could expect the extension to notify them when the URI is not set for a token, implicitly by returning an empty string.

This could be implemented with a simple check, avoiding super altogether. (based on v4.3.2)

abstract contract ERC721URIStorage is ERC721 {

  ...

  function tokenURI(uint256 tokenId) public view virtual override returns (string memory) {
    require(_exists(tokenId), "ERC721URIStorage: URI query for nonexistent token");

    string memory _tokenURI = _tokenURIs[tokenId];

    // If there is no URI set, return empty string
    if (bytes(_tokenURI).length == 0) {
      return "";
    }

    string memory base = _baseURI();

    // If there is no base URI, return the token URI.
    if (bytes(base).length == 0) {
      return _tokenURI;
    }

    // We got here, so both token's and base URIs are set
    // concatenate the baseURI and tokenURI (via abi.encodePacked).
    return string(abi.encodePacked(base, _tokenURI));

  }

  ...

}

Apologies if none of this makes any sense, I'm quite new to Solidity and still getting to know my way around a lot of things.
Your code helped me so much and I'd love to give back, so if anything sounds interesting I can work out a PR to improve on what I discussed above.

Hello @nxet

First of all, we understand that having storage private is sometimes difficult to work with, but this is a base rule for us and we are going to change it.

Now, given the behaviour that we implement in ERC721URIStorage I'm not sure how changing the storage is an issue. I think the question here is what behaviour do you need?

If you need something different ... then you should possibly write your own custom module on top of ERC721

The behaviour I'm looking to implement is a simple and cheap way to check if a token has the URI set already.

Changing the storage visibility to internal would allow to implement a very simple hasTokenURI check.
The second option I put in the comment, the change in behaviour I was discussing, simply proposed that when the URI is not set for a token, the ERC721URIStorage.tokenURI getter returns an empty string instead of ERC721.tokenURI.

I totally didn't mean to sound as bad as your reply made it look like, so apologies for anything that came off strong. I'm most definitely not criticizing your design choices, simply trying to better understand them.
Of course I'm writing plenty of implementations on top of your contracts, and to be fair I will simply compile my contract with a modified version of your ERC721URIStorage. But again, I figured this could have been an interesting point of view and wanted to share it with the community.

I'm really sorry if my initial reply made your question sound bad. We always appreciate users questioning the contract mechanism, which in some cases (like this one) are definitely not fitting all needs, and can be questioned.

You are interested by an approach where:

when the URI is not set for a token, the ERC721URIStorage.tokenURI getter returns an empty string instead of ERC721.tokenURI

IMO ERC721URIStorage meet this requirement out of the box.

If you have a look at ERC721, you will see that the implementation for _baseURI() is the following

function _baseURI() internal view virtual returns (string memory) {
    return "";
}

note that it doesn't read from storage. It cost practically nothing to run, and the compiler can optimize that a lot (inlining)

This means that the implementation in ERC721URIStorage will do

function tokenURI(uint256 tokenId) public view virtual override returns (string memory) {
        require(_exists(tokenId), "ERC721URIStorage: URI query for nonexistent token");

        string memory _tokenURI = _tokenURIs[tokenId];
        string memory base = _baseURI(); // ← this doesn't read from storage, just inlining an empty string

        // If there is no base URI, return the token URI.
        if (bytes(base).length == 0) { // ← this will always be the case, and the optimizer can see that
            return _tokenURI;
        }
        // EVERYTHING BELLOW THAT IS DEAD CODE
        // If both are set, concatenate the baseURI and tokenURI (via abi.encodePacked).
        if (bytes(_tokenURI).length > 0) {
            return string(abi.encodePacked(base, _tokenURI));
        }

        return super.tokenURI(tokenId);
}

i.e

function tokenURI(uint256 tokenId) public view virtual override returns (string memory) {
        require(_exists(tokenId), "ERC721URIStorage: URI query for nonexistent token");
        return _tokenURIs[tokenId];
}

So you may want to rewrite some of it, but actually, you don't need to. Just trust the optimizer, and understand that tokenURI(uint256) is almost as good as an accessor to the underlying storage. It adds the cost of the _exist(uint256) check, but I don't think its that bad.

If you want to optimize further, then by all means fork the ERC721URIStorage and use a version with an internal accessor ... or a version with the tokenURI(uint256) cleaned up.

First of all thank you for this reply, it sets a totally different tone for the conversation and is deeply appreciated.
Plus, it made me see that there is still a fundamental misunderstanding between us. I'm sure it stems from my limited understanding of the environment, and I hope you can further help me get things straight.

As stated before, the main objective is to be able to cheaply tell if a token has the URI set. I'm finding it impossible to do with the current implementations of ERC721 and ERC721URIStorage and the way they interact together.

Using the current release of OZ, some example code would look like:

contract MyToken is ERC721, ERC721URIStorage {

  ...

  // override baseURI
  function _baseURI() internal view override(ERC721) returns (string memory) {
    return "https://ipfs.io/ipfs/";
  }

  /**
   * @dev Set `tokenId` URI string
   * NB: This prevents overwrites, requiring `tokenId` to have no URI
   */
  function setTokenURI(uint256 tokenId, string memory uri) internal {
    require(
      _hasTokenURI(tokenId) == false,
      "Token URI can not be overwritten"
    );
    _setTokenURI(tokenId, uri); // this will require(exists)
  }

  /**
   * @dev Check if `tokenId` URI is set, with current implementation
   */
  function _hasTokenURI(uint256 tokenId) internal view returns (bool) {

    // Assuming we're calling this method for a token
    // with the URI unset in the ERC721URIStorage._tokenURIs storage,
    // i.e. _tokenURIs[tokenId] is an empty string.
    // This call returns 'https://ipfs.io/ipfs/{tokenId}',
    // because ERC721URIStorage.tokenURI:
    // - requires(exist)
    // - checks if base.length == 0, but continues
    // - checks if tokenURI.length > 0, but continues
    // - returns super(), which:
    //   - requires(exist)
    //   - checks if base.length > 0, and concatenates base with tokenId
    string memory current = tokenURI(tokenId);

    // at this point we have a token with the URI still unset in the storage
    // i.e. _tokenURIs[tokenId] is still an empty string
    // but the result we received is a full URI, which can be misleading

    // the only way to check if the URI is an autogenerated string, rather than
    // a deliberately set one, is to recreate the placeholder URI and check
    // against the result we received above, basically calling `ERC721.tokenURI`
    // once again

    // of course here we can skip all the checks, since we know that baseURI is set
    string memory autogenerated = string(
      abi.encodePacked(
        _baseURI(), tokenId.toString()
      )
    );

    // at this point, check that the "stored" URI is not autogenerated
    return keccak256(bytes(current)) != keccak256(bytes(autogenerated));

  }

}

Here you can see that the costs involved in creating a custom hasTokenURI check extend far beyond the extra require, but rather lie in how the storage is either 1. handled or 2. presented. This forces the user to provide extra logic to check if the value returned by tokenURI represents a token without a URI stored in _tokenURIs.

My idea for 1 was a simple change in visibility from private to internal for the _tokenURIs variable. Again this would allow for direct access from inheriting contracts, making the hasTokenURI getter implementation a trivial one-liner.

function _hasTokenURI(uint256 tokenId) internal view returns (bool) {
  return bytes(_tokenURIs[tokenId]).length > 0;
}

Due to my evident lack of knowledge, I honestly fail to understand the implications of this change, and I'd be grateful for any further insight on the reasoning that led OZ to this design choice.

The change in behavior proposed for 2 is instead for ERC721URIStorage.tokenURI to explicitly return an empty string when the URI is not set in storage, i.e. _tokenURIs[tokenId] == "".
In this regard I do understand how this would need a lot more consideration, but the resulting _hasTokenURI helper would once again be a simple one-liner:

function _hasTokenURI(uint256 tokenId) internal view returns (bool) {
  return bytes(tokenURI(tokenId)).length > 0;
}

Once again, let me say that I'm new to this space and trying daily to improve my knowledge about the inner mechanics of Solidity.
There's a high chance I'm overlooking some crucial aspects of the language design, which would make my reasoning invalid.
And if that's the case, please be kind enough to point me in the right direction (i.e. links to docs) for a better understanding.

Thanks once again for your patience

This piece of code makes you request much clearer. It appears (and correct me if I'm wrong) that:

by default you want to use the token id as a ipfs hash ... which is strange since the tokenId will appear in a decimal format, which is not a valid base58 ipfs hash
have a mechanism that allows "someone" to change the URI from the default to something that can not be updated later on, but that still has to be an ipfs hash

We could add a _hasTokenURI function to ERC721URIStorage, but before we do that I'd like to understand your workflow better ... because I'm not sure that your usage should really be based on ERC721URIStorage. In particular, I don't think it makes any sens to have a default value that is not a valid IPFS hash. On the other hand, If your URI are all ipfs hash, I'm not sure if you need to set it manually for all tokens anyway.

As an example, I would take this contract. It has a base URI that is ipfs://QmX9wDSPVGXfWkmCzPRRzNPUUrmXFNXgdKBdTbhTrrbAPV/ (note that it uses pure ipfs, and not a centralized ipfs provider). This is the hash of a folder, with files numbered from 0 to 9999. That way they don't need to store individual URI for each token, and the default ERC721 behavior is sufficient.

Could that be a cheaper solution for you? If not, do you intend to use both the default resolution (fallback) and custom URIs ?

I'm glad the last example has been helpful to better explain myself.

by default you want to use the token id as a ipfs hash ... which is strange since the tokenId will appear in a decimal format, which is not a valid base58 ipfs hash

No, this is exactly the behaviour I intend to change. I dont want the URI to be an invalid IPFS url, as mistakenly concatenated by ERC721.
My goal is to have a common base uri for all tokens, in this case the IPFS endpoint, which is then concatenated with a string I will store in _tokenURIs.
But, crucially, the _tokenURIs[tokenId] value will NOT be set at mint, and will instead be provided at later date with a public method wrapping ERC721URIStorage._setTokenURI (as shown above). The tokens will be minted and left without the URI, until an external action will set it to a valid IPFS hash.

As you can see, in this scenario it is very problematic to have ERC721URIStorage.tokenURI returning an invalid autogenerated URI. And the workaround I proposed is not DRY, extremely unsatisfactory, not to mention costly.

As an example, I would take this contract...

This example is very interesting, and something I didn't think of at all. Will definitely keep in mind this approach to IPFS storage.
Unfortunately (and this is totally unrelated to the question at hand) this would bring along major changes in the design of other components (outside of the smart contracts suite) of my project.

We could add a _hasTokenURI function to ERC721URIStorage

Indeed this would be a useful helper to add to the interface, but I didn't want to mention it because of the (negligible) increase in footprint size of the extension contract, which would then be passed on to users which might honestly never need it.
By the same metric, proposal 1 would basically leave things unchanged, and proposal 2 might even reduce both deploy and operation costs since there would be one less require necessary (being super left out of the logic).
But then again, my analysis mostly takes in account gas costs, and might overlook serious security concerns.

Later today I will work out a couple improved examples to better explain my suggestions, but mostly to provide tangible code (instead of snippets spread across the whole discussion) to help better discuss and eventually move forward with the issue.

Again, thanks for your time

If you dont the tokenURI method to return base + tokenId when the token specific uri is not set, then you shouldn't be using ERC721URIStorage.

maybe something like

abstract contract ERC721CustomURI is ERC721 {
    using Strings for uint256;

    mapping(uint256 => string) private _tokenURIs;

    function tokenURI(uint256 tokenId) public view virtual override returns (string memory) {
        string memory _tokenURI = _tokenURIs[tokenId];
        require(_tokenURI.length != 0);
        return string(abi.encodePacked(_baseURI(), _tokenURI));
    }

    function _isTokenURISet(uint256 tokenId) internal virtual view returns (bool) {
        return bytes(_tokenURIs[tokenId]).length != 0
    }

    function _setTokenURI(uint256 tokenId, string memory _tokenURI) internal virtual {
        require(_exists(tokenId), "ERC721URIStorage: URI set of nonexistent token");
        _tokenURIs[tokenId] = _tokenURI;
    }

    function _burn(uint256 tokenId) internal virtual override {
        super._burn(tokenId);

        if (_isTokenURISet(tokenId)) {
            delete _tokenURIs[tokenId];
        }
    }
}

Let me just say this is frankly getting quite frustrating. I hoped it was clear at this point I'm not here asking for hastily developed examples on how to do the homework, but rather gain better insight about the framework I'm operating in and understand why the assumptions that brought me here were wrong.
The patronising attitude and deliberate overlooking of the points made is really disheartening, above all because there are questions left unanswered (i.e. would the change in visibility raise security concerns).

Back on topic, you say:

If you dont the tokenURI method to return base + tokenId when the token specific uri is not set, then you shouldn't be using ERC721URIStorage.

But what you just described is precisely what the base ERC721 implementation already does.
User doesn't need to inherit from ERC721URIStorage to achieve the behaviour you are describing here or providing in the example above: a simple override of the ERC721._baseURI method returning a custom string (i.e. IPFS folder URI) would then provide a return value of ERC721.tokenURI composed of baseURI + tokenId.
Shouldn't this be is a scenario where there's no point in inheriting from ERC721URIStorage, since the desired behaviour is what the base ERC721 implementation already does?
On the other hand, inheriting from a contract implementing a dedicated storage, ERC721URIStorage, IMO kind of implies that the user wants to primarily refer to the values stored in it (and know at a glance when the data isn't there).

So please, help me understand why this opinion is wrong.
I sincerely didn't mean to be rude with this comment, I just really want to learn a solid foundation upon which to base future interactions with the OZ contracts, and most of all avoid major misunderstandings starting from my very first line of code.

Thank you

I'm sorry if my tone sounds patronizing, this is really not the emotion I'm trying to convey. There might be some frustration that we are not understanding each other, but please be assured that I am trying to understand your request and answer it as best as possible. I'm sorry I've not been understanding it so far.

ERC721 includes a default implementation of URI, which we believe is the most common. You will note that this default implementation does not rely on any storage, so it's as effective as possible, and so it can be overridden with no hidden cost. Our logic here is that if someone wants a custom logic, they will just override the public tokenURI function, and the internal _basURI function will be optimized out (same for the library calls).

In other words, if you don't like the default tokenURI, your can swap it easily and you shouldn't have to pay any costs for the default one that you are no longer using.

ERC721URIStorage is a more expensive version, that is just here to replicate the behavior we had in an earlier version of contracts (3.x.x). Again, its designed to be somehow configurable (trough the baseURI) ... but it remains an opt-in solution for people that want/like the old behaviour.

In other words, if ERC721URIStorage doesn't clearly meet your demand, you should just not be using it. And when I say "you" I don't mean you personally, I mean anyone that is building an ERC721 contract.

As developers of OZ, we have some key rules that apply here:

Maintain backward compatibility, either in the contract itself or optionally through option extensions.
Have the smallest core possible, not forcing anyone to ship code they don't use.
Protect against the direct edition of the storage (all updates should go through "setters") ... this is not critical here, but it is in other places, and we like consistency in the application of our rules.
Let users expand the contract, by making the function virtual, and making sure that things can be swapped (hopefully without leaving extra dead code behind)

In your particular case, you seem to want something that doesn't really fit the behavior of ERC721URIStorage (even if we added a viewer to the storage), just because ERC721URIStorage behavior is having a fallback that in your context would be invalid. This is why I've always thought that you would be better of writing your own URI management mechanism on top of the core ERC721. It is not because we don't want to help, it is because we think this is really the better option. You are the person that best understands your workflow and your needs, and that is in the better position to write a URI function that correspond to your needs ... and if you need help building that custom module, we are ready to provide that help, but please don't see that proposal as "hastily developed examples on how to do the homework"

I'm not sure why there was such a big misunderstanding in this thread.

@nxet's purpose could probably be accomplished with a custom implementation of ERC721URIStorage. For simple contracts like this, we encourage forking the code to add custom behavior. However, it's a fair point that there is no way to inspect whether the storage URI for a token ID has been set. There is no getter counterpart to _setTokenURI, because tokenURI concatenates the base URI.

I don't see an issue with including an additional internal getter. It does not add to bytecode size unless the function is used, so that's not an issue.

hey there, can i still contribute to this issue ?

OpenZeppelin / openzeppelin-contracts

ERC721URIStorage: add internal getter for _tokenURIs #3023