code-423n4 / 2024-05-arbitrum-foundation-findings

3 stars 2 forks source link

Upgrades to EdgeChallengeManager using setChallengeManager can allow to confirm bad assertion #48

Closed howlbot-integration[bot] closed 4 months ago

howlbot-integration[bot] commented 4 months ago

Lines of code

https://github.com/code-423n4/2024-05-arbitrum-foundation/blob/main/src/rollup/RollupAdminLogic.sol#L326-L329

Vulnerability details

Description

The report is related to my other report related to EdgeChallengeManager (a.k.a. ECM) upgrade (Upgrades to EdgeChallengeManager changing staking requirement can cause funds loss), so you should read it first before continuing as I will not re-explain what was explained in my previous report to save both the reader and myself time.

This report is exploring the impact of upgrading the EdgeChallengeManager.sol entirely (using RollupAdminLogic::setChallengeManager) and the impact on the BoLD system overall. This is a new design, while before the stakeToken could be replaced (using RollupAdminLogic::setStakeToken), which is now being replaced by the fact that instead you replace the ECM as a whole (which might have a another stakeToken).

While the golang BoLD code is out-of-scope, since there is a key implication with how EdgeChallengeManager.sol and RollupAdminLogic.sol evolve in case of upgrade, I believe this should be considered in scope as involving directly those contract life cycle and seems to warrant High severity.

Validators are connected to a single instance ECM when they start. This can be seen at multiple places in the code but more importantly when creating the AssertionManager Manager::NewManager, and same for the assertion chain which internally use the same single ECM.

The validators will keep pooling the blockchain and syncing any new assertion created, and create challenge when it detect that an assertion is wrong, this is done inside sync.go. In case an ECM upgrade occurs, the following condition will trigger once the new ECM instance is officialized (RollupAdminLogic::setChallengeManager), which will have no real impact beside a logging warning.

    if args.canonicalParent.ChallengeManager != m.challengeManagerAddr {
        srvlog.Warn("Posted rival assertion, but could not challenge as challenge manager address did not match, "+
            "start a new server with the right challenge manager address", log.Ctx{
            "correctAssertion":                  postedRival.AssertionHash,
            "evilAssertion":                     args.invalidAssertion.AssertionHash,
            "expectedChallengeManagerAddress":   args.canonicalParent.ChallengeManager,
            "configuredChallengeManagerAddress": m.challengeManagerAddr,
        })
        return nil, nil
    }

We can see that the validator supporting BoLD will use the following code most likely (well similar, as this is from test, but would pull the ECM from the Rollup too) to pull the new ECM instance. Even if you could specify manually the ECM address, I don't think the current integration would be supporting a validator running multiple instances of the BoLD protocol, it doesn't seem to have been built for that at least out-of-the-box (but I could be wrong).

    challengeManagerAddr, err := assertionChainBinding.RollupUserLogicCaller.ChallengeManager(
        util.GetSafeCallOpts(&bind.CallOpts{Context: ctx}),
    )

Impacts

  1. (front-run) If all honest validator decides to restart as soon as the warning is being seen, that would left the old ECM with unfinished challenge(s), which could means some challenge stucks and/or bad assertion can be confirmed. For example, it could allow an evil validator to front-run the setChallengeManager call, and post a bad assertion against the old ECM which will never be challenged, and confirmed later on by the evil validator.
  2. (back-run) If all honest validator doesn't restart for a long enough period of time (enough time to have the assertion confirmed by time, so 6.36 days), a bad assertion could be confirmed on the new ECM instance by an evil validator, which would have submitted the new assertion as soon as detected the setChallengeManager call.

Proof of Concept

This is showcasing Impact 2 which seems to have higher probability to happen in production. We have 3 actors in play: Admin, Honest and Evil validator. 1) Admin deploy ECM contract. Everything run smootly for days. 2) Admin decides to upgrade the contract changing the staking requirement, which require a full upgrade and call setChallengeManager. 3) An Evil validator is monitoring the mempool and see the incoming setChallengeManager transaction. He decide to back-run it and post a new bad assertion against the new ECM instance. 4) The Honest validator is running smootly, while it will detects this assertion is bad and require a challenge, it will not chanllenge it as it belong to another ECM instance, but will post a warning in logs but no other action will be done. 5) After 6.36 days (a.k.a. current challengePeriodBlocks value), the Evil validator will be able to confirm the assertion, as it will not have been challenged (Honest will still be challenging only against the old ECM as not restarted). Furthermore, the challengeGracePeriodBlocks will not be considered as again this assertion would not have been challenged.

        if (prevAssertion.secondChildBlock > 0) {
            // if the prev has more than 1 child, check if this assertion is the challenge winner
            ChallengeEdge memory winningEdge = IEdgeChallengeManager(prevConfig.challengeManager).getEdge(winningEdgeId);
            require(winningEdge.claimId == assertionHash, "NOT_WINNER");
            require(winningEdge.status == EdgeStatus.Confirmed, "EDGE_NOT_CONFIRMED");
            require(winningEdge.confirmedAtBlock != 0, "ZERO_CONFIRMED_AT_BLOCK");
            // an additional number of blocks is added to ensure that the result of the challenge is widely
            // observable before it causes an assertion to be confirmed. After a winning edge is found, it will
            // always be challengeGracePeriodBlocks before an assertion can be confirmed
            require(
                block.number >= winningEdge.confirmedAtBlock + challengeGracePeriodBlocks,
                "CHALLENGE_GRACE_PERIOD_NOT_PASSED"
            );
        }

Recommended Mitigation Steps

Mitigation from the BoLD implementation - Approach One

  1. Have the validator BoLD implementation to listen OwnerFunctionCalled event and when id 32 is detected (setChallengeManager upgrade), instanciate another instance of the BoLD implementation. Yes that would means a validator can run multiple instances of BoLD implementation at the sametime, if this is not supported, that would need to be added.
    function setChallengeManager(address _challengeManager) external {
        challengeManager = IEdgeChallengeManager(_challengeManager);
        emit OwnerFunctionCalled(32);
    }
  2. Keep track which challenges are in progress, and once all assertion on the old ECM are confirmed, the validator could kill this BoLD instance.
  3. This tracking would also allow more robustness as if validator restart (crash or maintenance) after an upgrade, it should spawn an instance also (as he would anyway spawn an instance for the current ECM) against the old ECM in order to complete those challenge(s).

Mitigation from the BoLD implementation - Approach Two

Another complete different option would be to keep a single instance of the BoLD protocol running in the validator, but refactor it such that it is not bounded to a single ECM, but rather interact with the correcponding ECM based from the where the assertion have been created against.

Mitigation from the ECM

Another option would be to allow only normal upgrade and track the staking requirement inside the Challenge struct as suggested in my other report, which would remove the need todo full upgrade which change also the proxy and create this whole issue.

Assessed type

Upgradable

c4-sponsor commented 4 months ago

gzeoneth (sponsor) disputed

gzeoneth commented 4 months ago

Invalid, go-impl is out-of-scope. If the chal manager changes, we will deploy two instances of the go implementation.

Picodes commented 4 months ago

I don't see how we could consider this as in scope as the issue and the mitigation are all within the go implementation

c4-judge commented 4 months ago

Picodes marked the issue as unsatisfactory: Out of scope