code423n4 commented 1 year ago

Lines of code

https://github.com/code-423n4/2023-03-canto-identity/blob/main/canto-bio-protocol/src/Bio.sol#L60-L95

Vulnerability details

Impact

The Bio tokenURI function splits biography text into lines. The algorithm will take into account certain "continuation" characters to prevent splitting the line in the middle of these characters and keep accumulating those in the current line buffer (bytesLines):

https://github.com/code-423n4/2023-03-canto-identity/blob/main/canto-bio-protocol/src/Bio.sol#L60-L95

if ((i > 0 && (i + 1) % 40 == 0) || prevByteWasContinuation || i == lengthInBytes - 1) {
    bytes1 nextCharacter;
    if (i != lengthInBytes - 1) {
        nextCharacter = bioTextBytes[i + 1];
    }
    if (nextCharacter & 0xC0 == 0x80) {
        // Unicode continuation byte, top two bits are 10
        prevByteWasContinuation = true;
    } else {
        // Do not split when the prev. or next character is a zero width joiner. Otherwise, 👨‍👧‍👦 could become 👨>‍👧‍👦
        // Furthermore, do not split when next character is skin tone modifier to avoid 🤦‍♂️\n🏻
        if (
            // Note that we do not need to check i < lengthInBytes - 4, because we assume that it's a valid UTF8 string and these prefixes imply that another byte follows
            (nextCharacter == 0xE2 && bioTextBytes[i + 2] == 0x80 && bioTextBytes[i + 3] == 0x8D) ||
            (nextCharacter == 0xF0 &&
                bioTextBytes[i + 2] == 0x9F &&
                bioTextBytes[i + 3] == 0x8F &&
                uint8(bioTextBytes[i + 4]) >= 187 &&
                uint8(bioTextBytes[i + 4]) <= 191) ||
            (i >= 2 &&
                bioTextBytes[i - 2] == 0xE2 &&
                bioTextBytes[i - 1] == 0x80 &&
                bioTextBytes[i] == 0x8D)
        ) {
            prevByteWasContinuation = true;
            continue;
        }
        assembly {
            mstore(bytesLines, bytesOffset)
        }
        strLines[insertedLines++] = string(bytesLines);
        bytesLines = new bytes(80);
        prevByteWasContinuation = false;
        bytesOffset = 0;
    }
}

However, if the unicode string is a sequence of these continuation characters (which is a valid UTF8 string) the line buffer (which is 80 bytes) will eventually overflow and will revert the transaction due to an index of out bounds exception.

Proof of Concept

In the following test we use a string with 21 U+1F3FE characters to overflow the line buffer and revert the query to tokenURI.

Note: the snippet shows only the relevant code for the test. Full test file can be found here.

function test_Bio_tokenURI_LineBufferOverflow() public {
    // This is a skin tone codepoint ("f09f8fbe") repeated 21 times
    string memory text = unicode"🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾🏾";
    bio.mint(string(text));
    vm.expectRevert();
    bio.tokenURI(1);
}

Recommendation

Change to a new line when the current line buffer is full.

c4-sponsor commented 1 year ago

OpenCoreCH marked the issue as sponsor confirmed

OpenCoreCH commented 1 year ago

Extreme edge case with a string that's technically valid, but semantically meaningless. But it's technically true.

c4-judge commented 1 year ago

0xleastwood marked the issue as primary issue

c4-judge commented 1 year ago

0xleastwood marked the issue as selected for report

code-423n4 / 2023-03-canto-identity-findings

Bio lines will overflow the buffer for repeated "continuation" characters #122

Lines of code

Vulnerability details

Impact

Proof of Concept

Recommendation