howlbot-integration[bot] commented 1 month ago

Lines of code

https://github.com/code-423n4/2024-05-arbitrum-foundation/blob/main/src/bridge/SequencerInbox.sol#L314 https://github.com/code-423n4/2024-05-arbitrum-foundation/blob/main/src/challengeV2/libraries/ChallengeEdgeLib.sol#L250-L251

Vulnerability details

Cause

The BOLD dispute protocol requires multiple turn-based steps. When deployed on L2 for enabling the L3 use-case, the L2 sequencer can exploit the L3. It can do that by censoring honest validators dispute moves and forcing them to use the L1 force inclusion mechanism. This allows the sequencer to attack or collude with an attacker to create invalid assertions for L3 chains settling to L2. By causing honest L3 validators to use the L1 force inclusion mechanism, the transactions will be included on L2 via L1 with a delay of up to 24 hours for each turn. As the dispute requires many tens of turns (around 50) to complete, these accumulated delays will cause the honest validators to lose.

Impact

The sequencer can submit arbitrary withdrawals and drain any L3 deployed on L2. While generally the sequencer should not be able to cause any safety violations for the sequenced chain, in this case, they can use their limited censorship ability to violate the safety of any L3 by preventing the timely progress of its dispute games.

The dispute requires around 50 turn-based steps from an honest participant, each delayed by possibly up to 24 hours. Because each participant is limited to a 7-day chess clock, honest participants can be forced to lose due to the L1 inclusion delays. This allows the attacker to be able to confirm an arbitrary assertion and prove arbitrary withdrawals from all L3 chains using BOLD on L2.

Notably, even if the maximum or average delay can be reduced, with 50 turn based steps, even if it's shortened to only 4 hours, the 7 day clock will be exhausted by the delays alone. Moreover, even if the average delay can be reduced to even 2 hours, this will leave the honest validators much less time than needed to reliably guarantee the safety of the chain according to the assumptions of the protocol.

L3s BOLD deployments are in scope

While this only is relevant for L3s using BOLD on L2s (Arbitrum itself), and not for L2s using BOLD on L1, the scope of this contest clearly includes the L3 use case:

Code: RollupCore and RollupAdmin code setting _assertionCreatedAtArbSysBlock and _hostChainIsArbitrum using ArbitrumChecker.runningOnArbitrum() that differentiates behavior and storage between deployments on L1 and L2 (rollups on Arbitrum itself - L3s) indicating the code is intended for both L1 and L2 deployment.
Docs: confirmed in the supplied documentation https://docs.arbitrum.io/how-arbitrum-works/bold/gentle-introduction

Q: For an L3 Orbit chain, secured using BOLD, that settles to Arbitrum One, does the one-step proof happen on Arbitrum One? A: Yes

Direct public confirmation by sponsor in Discord: https://discord.com/channels/810916927919620096/1235688998480117790/1242079453551661096

Participant: "And it will be work work layer 3 on arbitrum as well that want a implement the bold mechanism" Sponsor: "yeah that's right"

Severity

Because the sequencer is not trusted with safety with L3s, and is not trusted within the scope of fault proofs, the severity is high. This is not only true due to the trust guarantees, but also due to the intention to decentralize the sequencer role. The loss of funds and trust in the system would be catastrophic, due to the total loss of funds that the L2 sequencer operator is capable of inflicting upon the L3s.

Proof of Concept

As can be seen in the testCanConfirmByOneStep test which runs a challenge from start to finish, if a dishonest assertion is made, the honest validator must complete the full dispute.

The full dispute involves 6 levels: block level, then 4 intermediate levels, and a final SmallStep level. Additionally, a final confirmEdgeByOneStepProof must be called to prove the final small step single instruction on-chain to prevent the dishonest edge from being confirmed by time and its assertion being confirmed. This amounts to 6 levels and 1 additional call.
For each level, multiple interactions are needed, each can only be done after the opponent's move: createLayerZeroEdge to dispute the claim, and then several bisectEdge calls. Crucially, bisectEdge calls are done in turns because each bisected edge has to be "rivaled" before being further bisected until an edge of length one is reached for that level. Thus, it's not possible for the honest challenger to fully bisect their claim ahead of time (in a single L1 "move"). They must take turns with the attacker because they need to target the attacker's last move in their response (these calls must be done in turns). However, while the attacker has no delay, the honest move is delayed every time by having to go through L1 and the long force inclusion delay.
Because all progress is made by taking turns bisecting a 2^43 space of disagreements, it's easy to see that the number of turns is around 43 turn-based moves (as each bisection divides the space into two parts). In addition to this, the calls for createLayerZeroEdge for each level, and the final call to confirmEdgeByOneStepProof need to be included, for up to 50 turn-based actions.
For example, for the smaller config used in the tests (3 intermediate levels and 6 bisections in each level), 31 turn-by-turn moves are made by each opponent.
Clearly, 50 moves (or even the test config's 31 moves), each delayed by up to 24 hours, very quickly causes the honest challengers to "time out" and exhaust their 7-day clock (from the time of the first move).

Here's a step-by-step scenario for the attack described in the submission:

An L3 chain, called OrbitChain using the BOLD dispute protocol is deployed on Arbitrum (L2).
A malicious L2 (Arbitrum) sequencer operator decides to use its temporary sequencing censorship privileges to create an invalid assertion for the L3 chain.
The attacker submits the invalid assertion to the L2 Rollup chain as an assertion - newStakeOnNewAssertion.
Honest L3 validators recognize the assertion as invalid and attempt to initiate a dispute by calling createLayerZeroEdge to challenge the claim.
The malicious L2 sequencer censors the honest validators' L2 transactions, forcing them to use the L1 force inclusion mechanism to include their transactions on L2. The transactions are included with a significant delay due to the design of the force inclusion delay mechanism. During the delay the attacker's is unrivaled and gains "unrivaled" time.
The attacker responds to the honest validators' dispute without any delay, as their transactions are not censored by the sequencer.
For each level of the dispute (block level, intermediate levels, and SmallStep level), the honest validators and the attacker take turns calling bisectEdge to narrow down the disagreement space.
Due to the L1 force inclusion delay of up to 24 hours for each turn, the honest validators' moves are significantly delayed, while the attacker can respond immediately. During the delay the attacker's is unrivaled and gains "unrivaled" time.
As the dispute progresses through the levels, requiring around 50 turn-based steps, the accumulated delays from the L1 force inclusion mechanism cause the honest validators to exhaust their 7-day chess clock by allowing the attacker to accumulate 7 days of "unrivaled" time.
The honest validators are unable to complete the dispute within the time limit, allowing the attacker's invalid assertion to be confirmed.
The attacker calls confirmEdgeByTime to finish the dispute, and calls confirmAssertion to prove its assertion.
As a result, the invalid assertion is confirmed and added as a root into the outbox, enabling the attacker to submit arbitrary withdrawals and drain funds from the L3 chain.

Tools Used

Manual review.

Recommended Mitigation Steps

The chess-clock and the force-inclusion delay are inherently incompatible with such a long turn-based game if played on L2. In its current highly-interactive form, the application of BOLD should be limited to deployments on L1 (for use in validation of L2s) only, and not for L3s.

Alternatively, two improvements can be made to make the attack less impactful:

Reduce the number of turns to a very small number. This can be done by having each move post all of the data required to challenge any of its components. For example, the levels can be divided into 4 levels such that each level has 2^11 sub-levels (sub-edges) and a call createLayerZeroEdge contains all 2^11 sub levels (how this can be possible is detailed below). This way there are only 2 "turns":
1. Attacker: createLayerZeroEdge on first level (e.g., level 1) with 2048 sub-levels already detailed on-chain.
2. Defender: challenge the speciic sub-level - on level 2 already, via createLayerZeroEdge on second level, with all 2048 sub-levels included on-chain.
3. Attacker: level 3, with all data required for level 4 counter.
4. Defender: level 4, with all specific instructions.
5. Attacker: invoke OSP and settle dispute, or wait to lose on time.
Reduce the force inclusion delay to a smaller duration, for example 6 hours.

The combination of these two factors reduce the number of turn-based moves the challenger (defender) needs to make to only 2. If each of those moves is delayed by 6 hours of the force inclusion delay, only 12 hours of the 7 days' clock is lost to the attack, allowing the dispute to proceed with manageable impact.

Posting the required sub-level data would be more expensive for each transaction, but it also reduces the overall amount of transactions and complexity. Additionally on L1, data blobs can be used for this need to make this cheaper. On L2 calldata will be used, which will be posted to L1 blobs, which also make the costs manageable.

In order to store this data on-chain, instead of storing the full data, a merkle root can be calculated and stored from the input data (on-chain). Although merkelization of 2048 would need to be executed on-chain it would be manageable (roughly 4096 hashes) costing around 200K gas. Any subsequent player would be able to construct a proof for any element into the tree from the previously posted data.

Assessed type

Other

godzillaba commented 1 month ago

The force inclusion delay buffer feature prevents this type of repeated censorship by the sequencer.

Picodes commented 1 month ago

From the doc and the whitepaper the 7-day delay is intended for finalization on L1, so to me this is at most a misconfiguration issue, so falls within QA

c4-judge commented 1 month ago

Picodes changed the severity to QA (Quality Assurance)