GNSPS / solidity-bytes-utils

Utility Solidity library composed of basic operations for tightly packed bytes arrays
The Unlicense
514 stars 109 forks source link

Use keccak to compare arrays #69

Open CodeSandwich opened 7 months ago

CodeSandwich commented 7 months ago

BytesLib.equal and BytesLib.equal_nonAligned can be replaced with a much simpler, more gas efficient and purely Solidity (YUL probably won't bring any benefits here): return _preBytes.length == _postBytes.length && keccak256(_preBytes) == keccak256(_postBytes);.

GNSPS commented 7 months ago

The idea for the library was to provide a different alternative to the hashing option. That is dependent on the hash implementation versus having computation done on each of the bytes directly. I don't think I'd ever mask a simple hash comparison behind a lib, too.

In addition, have you tested and made sure it is more gas-efficient?

CodeSandwich commented 7 months ago

The idea for the library was to provide a different alternative to the hashing option. That is dependent on the hash implementation versus having computation done on each of the bytes directly. I don't think I'd ever mask a simple hash comparison behind a lib, too.

That's interesting, what for and why? Is hashing insecure?

Regarding gas usage, hashing costs 30 gas + 6 per 32-byte word, there's nothing to benchmark really, hashing is probably the cheapest thing you can do with an array of bytes in memory.

GNSPS commented 7 months ago

That's interesting, what for and why? Is hashing insecure?

Not really, but we're just giving optionality, I guess. I'm not afraid the keccak opcode is badly implemented in Geth.

But that's just the gas usage of the opcode. What about changing the memory pointer, filling the memory in the scratch space, etc.? Because my code does everything in place, we don't even touch the 0x0 - 0x40 scratch space like the Solidity compiler does to hash stuff together. What if you want to hash something that is bigger than 32 bytes? You need to copy multiple times. I'd say that the bigger the array, the more efficient my function is compared to hashing stuff together.

This being said, I never benchmarked it, so you may be right. I'd love to see some data to believe the statement, though.

CodeSandwich commented 7 months ago

Keccak works in-place, on an arbitrary memory range, check https://www.evm.codes. The scratch space is used mostly for calculating the storage slot indexes, because it involves repeated hashing of pairs of 32-byte words taken from the stack.

It's your project, so I can't tell you what to do, you'll do whatever you choose. I don't even use your code as a dependency, so I won't see the benefits of any optimization. All I'm saying is that you should check out hashing because it's a fairly popular strategy for reducing gas usage. Plus you'll drop a lot of complex Yul code.

GNSPS commented 7 months ago

I just benchmarked it, and you're totally right. Leaving the data here for posterity.

This is for a 4-byte-long array comparison with Keccak:

image

This is for the same comparison with my function:

image

Trying longer lengths shows that my function actually scales just a smidge slightly worse than the Keccak comparison.

With this being said, if the lengths do not match, my function is O(1) and performs better than the Keccak counterpart in any case.

To be honest, I've done this such a long time ago that I don't even know what I was thinking now, about memory manipulation of the Keccak opcode. It probably had something to do with the storage slot calculations, like you mentioned. 🤷‍♂️

Although this is all true, I still don't think I should change this function. First, because if using the Keccak version is what people want to do, using it through a library makes no sense since it is a one-liner. Second, because I've made this lib such a long time ago (and it's still consumed by some major projects) that I'm sincerely afraid to change any implementations at this point (e.g., that's why I merged the PR for the non aligned comparison as a new function instead of adding to the previous one). Third, because I like how this still serves as a well-commented, teaching example of the inner workings of the EVM. Fourth, because it's another option. 😄

But I sincerely thank you for bringing this up! 🙏 At least now we have some comparisons and options. Also happy to keep this issue open so people notice it when arriving at the repo. 👍

CodeSandwich commented 7 months ago

With this being said, if the lengths do not match, my function is O(1) and performs better than the Keccak counterpart in any case.

From the very first proposed version in this issue hashing is done only if the lengths are different. Omitting this path doesn't yield meaningful results.

I couldn't believe it so I wrote a benchmark too, here are the results. Hashing has a predictable cost which is a useful property on blockchains, and is usually faster, often by a lot. The only edge case are arrays of over 256 bytes where the very first byte is different, for which the loop-based implementation has the O(1) cost, so at some point it gets better than hashing.

Source ```solidity pragma solidity ^0.8.24; import {console, Test} from "forge-std/Test.sol"; contract EqualTest is Test { function testBenchEqual() public view { for(uint i = 1; i < 4096; i *= 2) { _benchForLength(i); } } function _benchForLength(uint256 length) internal view { bytes memory data = new bytes(length); for(uint256 i = 0; i < length; i++) { data[i] = bytes1(uint8(block.prevrandao + i)); } console.log("Same for length", length); this.benchEqual(data, data); bytes memory differentFirst = bytes.concat(data); differentFirst[0] ^= 0xff; console.log("Different first for length", length); this.benchEqual(data, differentFirst); bytes memory differentLast = bytes.concat(data); differentLast[differentLast.length -1] ^= 0xff; console.log("Different last for length", length); this.benchEqual(data, differentLast); bytes memory differentLength = bytes.concat(data, bytes1(0)); console.log("Different length for length", length); this.benchEqual(data, differentLength); console.log("-----------------------"); console.log(""); } function benchEqual(bytes memory preBytes, bytes memory postBytes) external view { uint256 gasEqual = gasleft(); bool equalResult = BytesLib.equal(preBytes, postBytes); gasEqual -= gasleft(); uint256 gasEqualNonAligned = gasleft(); bool equalNonAlignedResult = BytesLib.equal_nonAligned(preBytes, postBytes); gasEqualNonAligned -= gasleft(); uint256 gasEqualHash = gasleft(); bool equalHashResult = BytesLib.equalHash(preBytes, postBytes); gasEqualHash -= gasleft(); if(equalResult != equalNonAlignedResult) console.log("DIFFERENT ALIGNED AND NON ALIGNED!"); assertEq(equalResult, equalHashResult, "Invalid hash result"); console.log("Gas equal: ", gasEqual); console.log("Gas equal non aligned:", gasEqualNonAligned); console.log("Gas equal hash :", gasEqualHash); console.log(""); } } library BytesLib { function equalHash(bytes memory _preBytes, bytes memory _postBytes) internal pure returns (bool) { return _preBytes.length == _postBytes.length && keccak256(_preBytes) == keccak256(_postBytes); } function equal(bytes memory _preBytes, bytes memory _postBytes) internal pure returns (bool) { bool success = true; assembly { let length := mload(_preBytes) // if lengths don't match the arrays are not equal switch eq(length, mload(_postBytes)) case 1 { // cb is a circuit breaker in the for loop since there's // no said feature for inline assembly loops // cb = 1 - don't breaker // cb = 0 - break let cb := 1 let mc := add(_preBytes, 0x20) let end := add(mc, length) for { let cc := add(_postBytes, 0x20) // the next line is the loop condition: // while(uint256(mc < end) + cb == 2) } eq(add(lt(mc, end), cb), 2) { mc := add(mc, 0x20) cc := add(cc, 0x20) } { // if any of these checks fails then arrays are not equal if iszero(eq(mload(mc), mload(cc))) { // unsuccess: success := 0 cb := 0 } } } default { // unsuccess: success := 0 } } return success; } function equal_nonAligned(bytes memory _preBytes, bytes memory _postBytes) internal pure returns (bool) { bool success = true; assembly { let length := mload(_preBytes) // if lengths don't match the arrays are not equal switch eq(length, mload(_postBytes)) case 1 { // cb is a circuit breaker in the for loop since there's // no said feature for inline assembly loops // cb = 1 - don't breaker // cb = 0 - break let cb := 1 let endMinusWord := add(_preBytes, length) let mc := add(_preBytes, 0x20) let cc := add(_postBytes, 0x20) for { // the next line is the loop condition: // while(uint256(mc < endWord) + cb == 2) } eq(add(lt(mc, endMinusWord), cb), 2) { mc := add(mc, 0x20) cc := add(cc, 0x20) } { // if any of these checks fails then arrays are not equal if iszero(eq(mload(mc), mload(cc))) { // unsuccess: success := 0 cb := 0 } } // Only if still successful // For <1 word tail bytes if gt(success, 0) { // Get the remainder of length/32 // length % 32 = AND(length, 32 - 1) let numTailBytes := and(length, 0x1f) let mcRem := mload(mc) let ccRem := mload(cc) for { let i := 0 // the next line is the loop condition: // while(uint256(i < numTailBytes) + cb == 2) } eq(add(lt(i, numTailBytes), cb), 2) { i := add(i, 1) } { if iszero(eq(byte(i, mcRem), byte(i, ccRem))) { // unsuccess: success := 0 cb := 0 } } } } default { // unsuccess: success := 0 } } return success; } } ```
Results for via-ir=false and 200 optimizer runs ``` Same for length 1 Gas equal: 289 Gas equal non aligned: 368 Gas equal hash : 208 Different first for length 1 Gas equal: 305 Gas equal non aligned: 384 Gas equal hash : 208 Different last for length 1 Gas equal: 305 Gas equal non aligned: 384 Gas equal hash : 208 Different length for length 1 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 2 Gas equal: 289 Gas equal non aligned: 455 Gas equal hash : 208 Different first for length 2 Gas equal: 305 Gas equal non aligned: 384 Gas equal hash : 208 Different last for length 2 Gas equal: 305 Gas equal non aligned: 471 Gas equal hash : 208 Different length for length 2 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 4 Gas equal: 289 Gas equal non aligned: 629 Gas equal hash : 208 Different first for length 4 Gas equal: 305 Gas equal non aligned: 384 Gas equal hash : 208 Different last for length 4 Gas equal: 305 Gas equal non aligned: 645 Gas equal hash : 208 Different length for length 4 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 8 Gas equal: 289 Gas equal non aligned: 977 Gas equal hash : 208 Different first for length 8 Gas equal: 305 Gas equal non aligned: 384 Gas equal hash : 208 Different last for length 8 Gas equal: 305 Gas equal non aligned: 993 Gas equal hash : 208 Different length for length 8 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 16 Gas equal: 289 Gas equal non aligned: 1673 Gas equal hash : 208 Different first for length 16 Gas equal: 305 Gas equal non aligned: 384 Gas equal hash : 208 Different last for length 16 Gas equal: 305 Gas equal non aligned: 1689 Gas equal hash : 208 Different length for length 16 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 32 Gas equal: 289 Gas equal non aligned: 281 Gas equal hash : 208 Different first for length 32 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 305 Gas equal non aligned: 281 Gas equal hash : 208 Different last for length 32 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 305 Gas equal non aligned: 281 Gas equal hash : 208 Different length for length 32 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 64 Gas equal: 382 Gas equal non aligned: 374 Gas equal hash : 220 Different first for length 64 Gas equal: 305 Gas equal non aligned: 325 Gas equal hash : 220 Different last for length 64 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 398 Gas equal non aligned: 374 Gas equal hash : 220 Different length for length 64 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 128 Gas equal: 568 Gas equal non aligned: 560 Gas equal hash : 244 Different first for length 128 Gas equal: 305 Gas equal non aligned: 325 Gas equal hash : 244 Different last for length 128 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 584 Gas equal non aligned: 560 Gas equal hash : 244 Different length for length 128 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 256 Gas equal: 940 Gas equal non aligned: 932 Gas equal hash : 292 Different first for length 256 Gas equal: 305 Gas equal non aligned: 325 Gas equal hash : 292 Different last for length 256 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 956 Gas equal non aligned: 932 Gas equal hash : 292 Different length for length 256 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 512 Gas equal: 1684 Gas equal non aligned: 1676 Gas equal hash : 388 Different first for length 512 Gas equal: 305 Gas equal non aligned: 325 Gas equal hash : 388 Different last for length 512 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 1700 Gas equal non aligned: 1676 Gas equal hash : 388 Different length for length 512 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 1024 Gas equal: 3172 Gas equal non aligned: 3164 Gas equal hash : 580 Different first for length 1024 Gas equal: 305 Gas equal non aligned: 325 Gas equal hash : 580 Different last for length 1024 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 3188 Gas equal non aligned: 3164 Gas equal hash : 580 Different length for length 1024 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- Same for length 2048 Gas equal: 6148 Gas equal non aligned: 6140 Gas equal hash : 964 Different first for length 2048 Gas equal: 305 Gas equal non aligned: 325 Gas equal hash : 964 Different last for length 2048 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 6164 Gas equal non aligned: 6140 Gas equal hash : 964 Different length for length 2048 Gas equal: 140 Gas equal non aligned: 140 Gas equal hash : 95 ----------------------- ```
Results for via-ir=true and 200 optimizer runs ``` Same for length 1 Gas equal: 266 Gas equal non aligned: 369 Gas equal hash : 211 Different first for length 1 Gas equal: 294 Gas equal non aligned: 397 Gas equal hash : 211 Different last for length 1 Gas equal: 294 Gas equal non aligned: 397 Gas equal hash : 211 Different length for length 1 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 2 Gas equal: 266 Gas equal non aligned: 457 Gas equal hash : 211 Different first for length 2 Gas equal: 294 Gas equal non aligned: 397 Gas equal hash : 211 Different last for length 2 Gas equal: 294 Gas equal non aligned: 485 Gas equal hash : 211 Different length for length 2 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 4 Gas equal: 266 Gas equal non aligned: 633 Gas equal hash : 211 Different first for length 4 Gas equal: 294 Gas equal non aligned: 397 Gas equal hash : 211 Different last for length 4 Gas equal: 294 Gas equal non aligned: 661 Gas equal hash : 211 Different length for length 4 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 8 Gas equal: 266 Gas equal non aligned: 985 Gas equal hash : 211 Different first for length 8 Gas equal: 294 Gas equal non aligned: 397 Gas equal hash : 211 Different last for length 8 Gas equal: 294 Gas equal non aligned: 1013 Gas equal hash : 211 Different length for length 8 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 16 Gas equal: 266 Gas equal non aligned: 1689 Gas equal hash : 211 Different first for length 16 Gas equal: 294 Gas equal non aligned: 397 Gas equal hash : 211 Different last for length 16 Gas equal: 294 Gas equal non aligned: 1717 Gas equal hash : 211 Different length for length 16 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 32 Gas equal: 266 Gas equal non aligned: 281 Gas equal hash : 211 Different first for length 32 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 294 Gas equal non aligned: 281 Gas equal hash : 211 Different last for length 32 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 294 Gas equal non aligned: 281 Gas equal hash : 211 Different length for length 32 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 64 Gas equal: 360 Gas equal non aligned: 387 Gas equal hash : 223 Different first for length 64 Gas equal: 294 Gas equal non aligned: 342 Gas equal hash : 223 Different last for length 64 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 388 Gas equal non aligned: 387 Gas equal hash : 223 Different length for length 64 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 128 Gas equal: 548 Gas equal non aligned: 599 Gas equal hash : 247 Different first for length 128 Gas equal: 294 Gas equal non aligned: 342 Gas equal hash : 247 Different last for length 128 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 576 Gas equal non aligned: 599 Gas equal hash : 247 Different length for length 128 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 256 Gas equal: 924 Gas equal non aligned: 1023 Gas equal hash : 295 Different first for length 256 Gas equal: 294 Gas equal non aligned: 342 Gas equal hash : 295 Different last for length 256 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 952 Gas equal non aligned: 1023 Gas equal hash : 295 Different length for length 256 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 512 Gas equal: 1676 Gas equal non aligned: 1871 Gas equal hash : 391 Different first for length 512 Gas equal: 294 Gas equal non aligned: 342 Gas equal hash : 391 Different last for length 512 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 1704 Gas equal non aligned: 1871 Gas equal hash : 391 Different length for length 512 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 1024 Gas equal: 3180 Gas equal non aligned: 3567 Gas equal hash : 583 Different first for length 1024 Gas equal: 294 Gas equal non aligned: 342 Gas equal hash : 583 Different last for length 1024 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 3208 Gas equal non aligned: 3567 Gas equal hash : 583 Different length for length 1024 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- Same for length 2048 Gas equal: 6188 Gas equal non aligned: 6959 Gas equal hash : 967 Different first for length 2048 Gas equal: 294 Gas equal non aligned: 342 Gas equal hash : 967 Different last for length 2048 DIFFERENT ALIGNED AND NON ALIGNED! Gas equal: 6216 Gas equal non aligned: 6959 Gas equal hash : 967 Different length for length 2048 Gas equal: 74 Gas equal non aligned: 115 Gas equal hash : 87 ----------------------- ```

The runs marked as DIFFERENT ALIGNED AND NON ALIGNED! have equal and equal_nonAligned yielding different results. I think that regardless of switching to hashing, you should completely replace the equal function with equal_nonAlighed, the used algorithm just returns invalid values. Its not even a good teaching example, it's a bug, maybe the auditors will learn a common pitfall, but that's all. Keeping it as it is doesn't help the downstream, it leaves buggy code unpatched and makes the obvious choice wrong, being correct should not be the special case.

GNSPS commented 7 months ago

I mean, the normal function works as long as you use vanilla Solidity or don't use Yul to mess with the memory pointer manually. That was always its intended use case. In fact its first version was reviewed together with the compiler team. I'm guessing only in situations where the array is non-aligned to 32-byte slots, you got the different values? If not, that actually is signalling a bug in one of the functions, and I'd love it if you could share the test code with me so I could debug.

I could consider changing the non-aligned function to become the main one, though, given we don't care about the gas savings.

Was I right about the above or should I be worried about a bug in the functions?

GNSPS commented 7 months ago

Also, please note this library is 7 years old and, again, I am terribly afraid of breaking shit downstream at this point. 🙏

CodeSandwich commented 7 months ago

share the test code with me so I could debug

It's the code I shared above, just run it with forge test -vv and see which cases print DIFFERENT ALIGNED AND NON ALIGNED!. It seems that it happens when the arrays lengths are a multiple of 32, and the only difference between them is the last byte. Unless via-ir is used, then sometimes even a different first byte triggers the error.

GNSPS commented 7 months ago

Sorry! 😅🙈 Completely ignored the “Source” tab. I’ll take a look! 🙏

CodeSandwich commented 7 months ago

I found the bug. In equal_nonAligned numTailBytes should never be 0, for cases where it is, it should be set to 32. This is because endMinusWord always leaves 1-32 unchecked bytes, never 0.

IMO this function is extremely overcomplicated, even forgetting about hashing for a moment. Instead of having a special routine for comparing the partial last word, you could just replace the initial let mc := add(_preBytes, 0x20) with let mc := add(_preBytes, and(mload(_preBytes), 0x1F)) and replace endMinusWord with the regular let end := add(_preBytes, add(length, 32)). This way you will check part of the length word twice, but it won't affect the result. The cb variable can also be dropped, it's always the same as success.

GNSPS commented 7 months ago

My current span of attention for this being so thin was the reason why I merged this PR into a different function. I am actually happy I didn't merge or replace the normal equal function for this one like the contributor asked me to.

I need some proper time to review the bug and your proposal. But this is great, and I can't thank you enough for taking such a close look! 🙏