Open nlordell opened 7 months ago
I can confirm this. I see that in the optimized Yul code the loop is identical in both cases and the difference is only visible in assembly. This means that it must be the effect of the optimized Yul->EVM transform. We fall back to the legacy transform if the assembly blocks are not memory-safe.
Honestly, I'm not sure if this is something we'll be able to fix. I think it's likely to not be a result of a bug that can be simply "fixed", but rather a trade-off that might affect other cases. We're generally working on improving the transform, so this may also improve as a result of the upcoming changes and might not make sense to target specifically. Need @ekpyron's opinion here.
The stack shuffling issue is specific to the two
staticcalls
to the SHA-256 precompile that follow. In particular, if you remove any of them, the problem is no longer observed and you get the expected behaviour of code withmemory-safe
assembly tags being slightly more performant than potentially memory unsafe assembly.
Interesting. I expected that this would be caused by stuff being optimized out without the calls, but no. When I comment out the staticcall
s the loop is still there, and still the same in both cases, just processed differently by the transform.
(Also on an unrelated note, I noticed that it is compiling
sub(shift, 6)
asadd(shift, 0xffff...ffa
which is 31 more code bytes and also seems like a minor bug...)
Maybe we should reconsider #6765 after all. I think it happens because Yul optimizer always converts sub()
to add()
and in this form this leaves the constant optimizer only the choice between add(shift, not(5))
(which is more space-efficient and chosen for lower runs
values) and add(shift, 0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffa)
(which is cheaper to execute). Converting it back to sub(shift, 6)
would be a much better alternative, since it's both cheaper and more space-efficient.
I would not necessarily recommend it as a general solution, but as of 0.8.25 we still support the stackAllocation
option in Standard JSON, which lets you select the old Yul->EVM transform. Keep in mind though that we're going to deprecate it eventually.
CLI repro using Foundry. Assumes that the contract from the issue description is in a file called base64-safe.sol
.
export seed="grace crime cat remove spice bean concert lawsuit render horse collect vocal"
anvil --mnemonic "$seed"
export key=0x60b139825a56a987d58b20f0145e05dc45bed12df72cb92812b5ea988383c987
function benchmark_contract {
solc "$1".sol --via-ir --optimize --optimize-runs 100000 --bin |
sed -e '/^=======.*=======$/d' -e '/^Binary:$/d' -e '/^\s*$/d' > "$1".bin
local address
address=$(
cast send --json --private-key "$key" --create "$(cat "$1".bin)" |
jq --raw-output .contractAddress
)
gas=$(
cast send --json --private-key "$key" "$address" "bench()" |
jq --raw-output .gasUsed
)
echo "$((gas - 21000))"
}
benchmark_contract base64-safe
sed -e 's/("memory-safe")//g' base64-safe.sol > base64-unsafe.sol
benchmark_contract base64-unsafe
5104
4486
Thanks for the detailed response.
Another possible work around would be to split the code into separate functions, one with the loop and another with the static calls, and force the loop to not be optimized somehow (we can achieve this by having an additional external function that calls the loop function). I verified it with the above code and it had the desired effect.
Is there a better way to tell the Yul optimizer "please don't inline this function" without changing the ABI of the contract?
I did my best to ensure that this issue has not already been reported. I apologise if I missed an existing issue.
Description
I cam across some specific cases where tagging assembly as
memory-safe
is detrimental to gas optimizations. In particular, the attached case led to ~16.6% increased gas consumption where tagging assembly asmemory-safe
.Code listing
```solidity // SPDX-License-Identifier: LGPL-3.0-only pragma solidity ^0.8.20; contract Base64Url { function encode(bytes32 input) private pure returns (string memory output) { assembly ("memory-safe") { output := mload(0x40) mstore(0x40, add(output, 96)) mstore(output, 43) // Write the base-64 lookup table to the scratch space, simplifying the copy code. Note // that we write the prefix with `MSTORE` as Solidity does not have support for // `CODECOPY` from bytes or string constants in inline assembly. mstore(0, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef") mstore(32, "ghijklmnopqrstuvwxyz0123456789-_") let ptr := add(output, 32) for { let shift := 250 } sgt(shift, 0) { shift := sub(shift, 6) } { mstore8(ptr, byte(0, mload(and(0x3f, shr(shift, input))))) ptr := add(ptr, 1) } mstore8(ptr, byte(0, mload(and(0x3f, shl(2, input))))) } } function bench() external view returns (uint256 gas, bytes32 hash) { bytes32 input = keccak256("test"); gas = gasleft(); string memory output = encode(input); assembly ("memory-safe") { pop(staticcall(gas(), 0x2, output, mload(output), 0, 32)) hash := mload(0) } bytes memory data = abi.encode(hash); assembly ("memory-safe") { pop(staticcall(gas(), 0x2, data, mload(data), 0, 32)) hash := mload(0) } gas = gas - gasleft(); } } ```In this particular case, removing the
memory-safe
tag from any of the assembly blocks with result in much more optimal code. From what I understand, having a memory unsafe block will cause the compiler to disable a class of optimization, of them one particular one which is causing it to generate non-optimal code.I analyzed the compiler output, and it appears to be generating a lot of additional
DUP
,SWAP
andPOP
stack shuffling inside thefor
loop when all the optimizations are enabled with thememory-safe
tag, and, because thefor
loop iterates 42 times, causes the large gas discrepancy. Additionally, it is worth noting that the following twostaticcall
s to the SHA-256 precompiles are crucial to confusing the optimizer.Some other note worthy findings that support that the optimizer is messing up the stack shuffling in this case are:
staticcall
s to the SHA-256 precompile that follow. In particular, if you remove any of them, the problem is no longer observed and you get the expected behaviour of code withmemory-safe
assembly tags being slightly more performant than potentially memory unsafe assembly.external
function for example), then the bug goes away, and the compiler generates optimal code again. So really the issue is the assembly block followed by the twostaticcall
s.Environment
Steps to Reproduce
See code listing :point_up:. Compiler settings: