c4-submissions commented 11 months ago

Lines of code

https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/zksync/facets/Executor.sol#L304

Vulnerability details

Impact

Protocol Version Discrepancy: In cases where an L2 upgrade fails but is processed on L1 without concern for its execution result, the protocol version is advanced while the actual system remains unaltered on L2. This results in a discrepancy between the recorded protocol version and the operational state of the system.
Unique Transaction Hash Requirement: The protocol typically requires that the transaction hashes of L2 system upgrades be unique, with the nonce of the L2 upgrade transaction equal to the new protocol version. When an L2 upgrade fails but is not appropriately recognized on L1, this requirement is breached, necessitating a refactor of the contracts involved.

Proof of Concept

In the course of an upgrade, the sequence of function calls is as follows:

Governance::execute >> Governance::_execute >> DiamondProxy::fallback >> Admin::executeUpgrade >> Diamond::diamondCut >> Diamond::_initializeDiamondCut >> DefaultUpgrade::upgrade https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/upgrades/DefaultUpgrade.sol#L25

During an upgrade, a new protocol version is established, and it should be numerically higher than the previous version. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/upgrades/DefaultUpgrade.sol#L28

Let's examine two scenarios:

In the first case, if the upgrade includes an L1 upgrade and it fails to execute successfully, it will result in a rollback that reverses the entire upgrade transaction. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/governance/Governance.sol#L230
In the second case, if the upgrade includes an L2 upgrade and it fails to execute successfully on L2, there's a challenge because the execution should take place on L2. As a result, it is unclear immediately whether the upgrade was successful.

The issue lies in the second case. Let's assume that the upgrade exclusively involves an L2 upgrade, and the upgrade transaction is executed on L1 such as the following flow of function calls:

DefaultUpgrade::upgrade >> BaseZkSyncUpgrade::_setL2SystemContractUpgrade

Crucially, within this function, the transaction hash of the upgrade transaction is stored in s.l2SystemContractsUpgradeTxHash. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/upgrades/BaseZkSyncUpgrade.sol#L193

Subsequently, this upgrade transaction should be executed on L2 with _l2ProtocolUpgradeTx.txType = 254. In the bootloader, when processing such a transaction type, the canonicalL1TxHash is sent natively to L1. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/system-contracts/bootloader/bootloader.yul#L587

Following that, the processL1Tx function is invoked, in which the transaction is prepared and subsequently executed. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/system-contracts/bootloader/bootloader.yul#L589

During the execution, if, for any reason, such as running out of gas, the execution fails, the entire program does not revert. Instead, an L2 to L1 log is sent with success = false. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/system-contracts/bootloader/bootloader.yul#L908-L919 https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/system-contracts/bootloader/bootloader.yul#L966

On the L1 side, during the batch commitment, the function _processL2Logs is called. Within this function, a check is made to ensure that _expectedSystemContractUpgradeTxHash is equal to the canonicalL1TxHash associated with the L2 upgrade transaction. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/zksync/facets/Executor.sol#L157

The issue stems from the fact that, during the execution of the batch, the s.l2SystemContractsUpgradeTxHash is cleared, regardless of the outcome of the L2 upgrade transaction on L2. To put it differently, on the L2 side, if the upgrade transaction is executed and returns a 'false' result, it signifies that the upgrade was not correctly implemented on L2. However, on the L1 side, it does not care about the outcome of the L2 upgrade transaction on L2, it only cares about the execution of the L2 upgrade transaction, regardless of its success or failure. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/zksync/facets/Executor.sol#L304

The implications of this issue are as follows:

On L1, s.l2SystemContractsUpgradeTxHash becomes zero, indicating that the previous L2 upgrade has been completed. However, it's important to note that the protocol version was incremented despite the absence of an actual upgrade on L2 due to the unsuccessful execution. Consequently, the protocol version is raised without a corresponding system upgrade.
A discrepancy arises between s.protocolVersion and the new upgrade transaction. This discrepancy occurs because the protocol aims to have unique hashes for L2 system upgrade transactions, requiring that the nonce of the L2 upgrade transaction be equal to the new protocol version. Moreover, the new protocol version should be numerically higher than the previous one. Therefore, it appears that the contracts DefaultUpgrade and BaseZkSyncUpgrade require a revision to address this scenario. https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/upgrades/BaseZkSyncUpgrade.sol#L217 https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/upgrades/BaseZkSyncUpgrade.sol#L186
The owner is required to schedule another upgrade, and this necessitates the passage of time for the delay between proposal and execution to elapse (in the case of a delayed upgrade).

Tools Used

Recommended Mitigation Steps

One viable approach is to address this issue by incorporating the result of the L2 upgrade transaction into the batch execution process. If the L2 upgrade fails, this solution entails resetting the protocol version to its previous state. However, when an upgrade involves both L1 and L2 components, a straightforward rollback of the protocol version to its former state is not feasible, as the L1 upgrade has succeeded while the L2 counterpart has encountered difficulties.

function executeBatches(StoredBatchInfo[] calldata _batchesData) external nonReentrant onlyValidator {
        //...

        uint256 batchWhenUpgradeHappened = s.l2SystemContractsUpgradeBatchNumber;
        if (batchWhenUpgradeHappened != 0 && batchWhenUpgradeHappened <= newTotalBatchesExecuted) {
            delete s.l2SystemContractsUpgradeTxHash;
            delete s.l2SystemContractsUpgradeBatchNumber;
            if (!proveL1ToL2TransactionStatus(...)){ // checking the L2 upgrade tx was successful or not
               s.protocolVersion = s.OldProtocolVersion; // assuming the old protocol version is stored
            }
        }
    }

https://github.com/code-423n4/2023-10-zksync/blob/1fb4649b612fac7b4ee613df6f6b7d921ddd6b0d/code/contracts/ethereum/contracts/zksync/facets/Executor.sol#L291

Assessed type

Context

c4-pre-sort commented 11 months ago

bytes032 marked the issue as primary issue

c4-pre-sort commented 11 months ago

141345 marked the issue as sufficient quality report

miladpiri commented 11 months ago

Medium is fair. We can make the bootloader fail if the upgrade transaction is unsuccessful. In general, it is a good idea that is needed to be implemented.

c4-sponsor commented 11 months ago

miladpiri (sponsor) confirmed

miladpiri commented 11 months ago

The duplicates indicated by the pre-sorter are irrelevant. They should be judged seperately.

GalloDaSballo commented 10 months ago

The Warden has shown how, due to operative risks, a failed upgrade transaction can create a desynchronization between the upgraded version and the version reported

I believe the finding leans between Operative Mistake (QA) and risk of protocol DOS (Med) and am leaning towards Med due to the Sponsors comment

c4-judge commented 10 months ago

GalloDaSballo marked the issue as selected for report

0xunforgiven commented 10 months ago

Hi @GalloDaSballo, @miladpiri. Thanks for reviewing my comment.

I believe there is a lot of hypothesize and conditions and operational mistakes requires for Issue #214. we have multiple similar issues that are invalidated, I have listed some of them in issue #221's comment.

This commnet explain that there is no real impact mentioned in issue #214 except a burned version number which can be replace by L1 upgrade tx.

Duplicate Issue:

Further more, I believe issue #748 is exact match to #214:

214 says:

In cases where an L2 upgrade fails but is processed on L1 without concern for its execution result, the protocol version is advanced while the actual system remains unaltered on L2.
748 says:

So the system update transaction, whether it succeeds or fails, is treated as it had already been executed, by going through steps 1-4. and clearing the l2SystemContractsUpgradeTxHash

System update execution failures are ignored, and if they are security patches, they are ignored, which may cause other transactions to continue to run normally causing security risks.

So both issues #214 and #748 showing concerns regarding L1 contract accepting reverted L2 upgrade transaction as success. but sponsor comment regarding them are is contradictive:

Sponsor Comment:

The sponsor response for issue #214 is (link):

Medium is fair. We can make the bootloader fail if the upgrade transaction is unsuccessful. In general, it is a good idea that is needed to be implemented.

While the sponsor response for issue #748 is(link):

We should verify the behavior, when upgrade transaction fails. The suggested fix is probably not correct (failing full bootloader, when update fails, means that the whole chain is stuck). The chain should not process if the upgrade transaction fails. However, it is more of an operational issue, so it is Low. Not duplicate.

These are obviously in contradict with each other, So I believe there is a misunderstanding or miscommunication here. based on the above facts and the fact the upgrade txs are carefully created by admins and reviewed by security council, and the fact that there is no Mediu/High impact mentioned here, and the fact that governance can perform upgrade/revert by L1 upgrade, I suggest low severity for this issue.

code-423n4 / 2023-10-zksync-findings

Synchronization Issue Between L1 and L2 Upgrades #214