code-423n4 / 2024-05-arbitrum-foundation-findings

1 stars 2 forks source link

Inconsistent sequencer unexpected delay in DelayBuffer may harm users calling forceInclusion() #55

Open howlbot-integration[bot] opened 1 month ago

howlbot-integration[bot] commented 1 month ago

Lines of code

https://github.com/code-423n4/2024-05-arbitrum-foundation/blob/main/src/bridge/DelayBuffer.sol#L43 https://github.com/code-423n4/2024-05-arbitrum-foundation/blob/main/src/bridge/DelayBuffer.sol#L90-L98 https://github.com/code-423n4/2024-05-arbitrum-foundation/blob/main/src/bridge/SequencerInbox.sol#L287

Vulnerability details

Impact

Buffer unepected delay due to sequencer outage is inconsistent.

Proof of Concept

When the sequencer is down, users may call SequencerInbox::forceInclusion() to get their message added to the inbox sequencerInboxAccs. However, there is an incosistency when the sequencer has been down and there is more than 1 message, or even if just 1 message.

The DelayBuffer is used to dynamically adjust how much delay a user has to wait to call SequencerInbox::forceInclusion(). The buffer increase mechanism is not relevant for this issue.

The buffer decrease consists of subtracting the last time the buffer was updated, self.prevSequencedBlockNumber, by the previous block number of the last delay buffer update, self.prevBlockNumber. This is done this way to ensure that the sequencer delay can not be double counted, as the delayed inbox may have more than 1 delayed message.

However, the approach taken as a way of protecting the sequencer and not depleting the buffer incorrectly as the drawback that it also means that the buffer will not always be decreased.

For example, if the sequencer is working at a block number A, a message is submited at block number A + 10 and another one at block number A + 110. If the sequencer is down, the user has to wait, for example, delay blocks of 100 (or the delay buffer, depending on the smallest). If 200 blocks have passed since A + 10 (the first message), both delayed messages may be force included, at block number A + 210.

The discrepancy is that, depending on how the delayed messages are included, the buffer delay will be reduced differently. If both messages are removed at once, by calling SequencerInbox::forceInclusion() with the id of the newest message, the buffer delay will not be decreased, as self.prevBlockNumber is A, the same as self.prevSequencedBlockNumber. This would harm users that want to force included messages as the buffer would be bigger and they would have to wait more time for their message to be force included.

If the oldest message is first force included, the buffer delay will not decreased, as above, but self.prevSequencedBlockNumber will be updated to A + 210 and self.prevBlockNumber to A + 10. Then, if the second message is force included, self.prevSequencedBlockNumber - self.prevBlockNumber == A + 210 - (A + 10) == 200, which means that the buffer would correctly decrease as the sequencer was offline (ignoring the fact that there is a threshold, but the issue is the same as long as the delay is bigger than the threshold).

As a proof of concept, add the following test to SequencerInbox.t.sol. It shows that given 2 messages, the delay buffer is only decreased if the first message is forcely included first and only after is the newest message included. If the given delayedMessagesRead is the if of the second message, without forcely including the first one first, the buffer will not decrease.

function test_POC_InconsistentBuffer_Decrease() public {
    BufferConfig memory configBufferable = BufferConfig({
        threshold: 600, //60 * 60 * 2 / 12
        max: 14400, //24 * 60 * 60 / 12 * 2
        replenishRateInBasis: 714
    });

    (SequencerInbox seqInbox, Bridge bridge) = deployRollup(false, true, configBufferable);
    address delayedInboxSender = address(140);
    uint8 delayedInboxKind = 3;
    bytes32 messageDataHash = RAND.Bytes32();

    (uint64 bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 14400);

    vm.startPrank(dummyInbox);
    bridge.enqueueDelayedMessage(delayedInboxKind, delayedInboxSender, messageDataHash);
    vm.roll(block.number + 1100);
    bridge.enqueueDelayedMessage(delayedInboxKind, delayedInboxSender, messageDataHash);
    vm.stopPrank();

    vm.roll(block.number + 500);

    uint256 delayedMessagesRead = bridge.delayedMessageCount();

    // buffer is not decreased if the first and second messages are included at once
    seqInbox.forceInclusion(
            delayedMessagesRead,
            delayedInboxKind,
            [uint64(block.number - 500), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
        );
    (bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 14400);

    // buffer is only decreased if the first message is included first
    /*seqInbox.forceInclusion(
            delayedMessagesRead - 6,
            delayedInboxKind,
            [uint64(block.number - 1600), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
    );
    (bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 14400);

    seqInbox.forceInclusion(
            delayedMessagesRead,
            delayedInboxKind,
            [uint64(block.number - 500), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
        );
    (bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 13478);*/
}

Tools Used

Vscode

Foundry

Recommended Mitigation Steps

A mitigation must be carefully taken as not to introduce double accounting of the buffer delay. One solution that would fix this issue is tracking the total unexpected delay separately and making it equal to the block.number minus the maximum between the previous sequenced block number and the oldest delayed message that was not yet included. This way, by doing the maximum with the last sequenced, we guarantee that no double accounting of delays takes place. By doing the maximum with the oldest delayed message, we guarantee that the delay is real and not that no message was submitted.

Assessed type

Math

c4-sponsor commented 1 month ago

gzeoneth (sponsor) disputed

gzeoneth commented 1 month ago

The delay buffer is an intermediary goal and not a final goal. The purpose of the delay buffer is to provide force inclusion assurances. The delay buffer updates are staggered and the buffer updates proactively in the force inclusion method (https://github.com/OffchainLabs/bold/blob/32eaf85e8ed45d069eb77e299b71fd6f3924bf40/contracts/src/bridge/SequencerInbox.sol#L309). The behavior described is not unexpected and does not have impact on force inclusion.

Picodes commented 1 month ago

This report shows how the delay buffer update could be nondeterministic and depend on how messages are included, but as shown by the sponsor fails to demonstrate why this would be important and how this fulfills the definition of Medium severity (function of the protocol or its availability could be impacted)

c4-judge commented 1 month ago

Picodes changed the severity to QA (Quality Assurance)

Simao123133 commented 1 month ago

Hi @Picodes, I made an illustration highlighting the impact of this issue. DelayBufferExplanation

The image is exactly like the POC, where it can be seen that if the newest message (at T4) is force included, the buffer is not correctly depleted and the delay only accounts for T5 - T4. Thus, the comment of the sponsor is not applicable to this scenario as the updates are not staggered and are not proactive, as time intervals are being skipped (between T3 and T4). The updates would be staggered and proactive if T3 was first force included, and only then T4, but this is not guaranteed on forceInclusion().

We can also confirm this issue by looking at SequencerInbox::delayProofImpl(), as it is correctly picking the oldest delayed message (T3 in the example) to update the buffer, exactly to avoid this problem and not skip any interval. Thus, the comment of the sponsor applies to SequencerInbox::delayProofImpl(), but not to forceInclusion(), where the time interval between the newest and the oldest messages in the batch may be skipped.

Finally, the definition of a medium severity vulnerability is the following:

but the function of the protocol or its availability could be impacted

Which falls exactly under the impact described in the report, as the availability of users force including messages is impacted

This would harm users that want to force included messages as the buffer would be bigger and they would have to wait more time for their message to be force included.

For reference here is the time that users have to wait, which is given by the size of the buffer. If the buffer is bigger than it should as it was not correctly depleted, users have to wait more time to force include their messages.

gzeoneth commented 1 month ago

There are still no harm, the delay buffer code allow t3 to be force included before t4 if there are already sufficient delay. If the sender of t3 choose not to force include t3, it is ok to not deplete the relevant delay buffer. In the forceInclusion caller's perspective, the sequencer still cannot cause more delay than expected.

Simao123133 commented 1 month ago

There are still no harm, the delay buffer code allow t3 to be force included before t4 if there are already sufficient delay.

This is not guaranteed, someone may force include t4 and skip a large number of blocks and not deplete the buffer.

If the sender of t3 choose not to force include t3, it is ok to not deplete the relevant delay buffer.

It is a bug, acknowledging it does not reduce the validity of the issue.

the forceInclusion caller's perspective, the sequencer still cannot cause more delay than expected.

Yes it can, by always force including the most recent message, the buffer is not correctly depleted, so users have to wait more time, hence bigger delay. It harms users.

gzeoneth commented 1 month ago

Force Inclusion can be called by anyone, any interested honest party can call it ASAP when they observed sufficient delay and would never see the "bug" you described.

Simao123133 commented 1 month ago

And any malicious party can do otherwise.

gzeoneth commented 1 month ago

And any malicious party can do otherwise.

Not sure what you mean by otherwise, if the honest party call forceInclusion on the earliest block possible, the malicious party's (non)actions are irrelevant.

Simao123133 commented 1 month ago

Not sure what you mean by otherwise, if the honest party call forceInclusion on the earliest block possible, the malicious party's (non)actions are irrelevant.

The bug still exists. Agree it can be mitigated by actively managing force inclusions and triggering them sequentially to prevent it. However, nothing like this was documented, and this puts other constraints in the system, where each message has to be force included sequentially, incurring other infra/gas costs, possible downtime failures, and so on. If the sequencer was down and a very significant number of users was force including, what would be the cost of making sure all force inclusions are sequential?

The force inclusion method specifically allows force including a batch of messages at once, and the availability of this specification would be compromised to fix this bug. So in either case, it is a valid medium severity issue.

Picodes commented 1 month ago

@Simao123133 I still side with the sponsor on this one and think this if of Low severity and I'll try to make my point:

Simao123133 commented 1 month ago

Hi @Picodes, thank you for the rereview. I am trying my hardest to discuss this in the most fact based way possible.

I don't agree that messages will be force included sequentially like that, gas prices are often expensive on Ethereum, it's perfectly likely users wait for someone to trigger it for free. Most importantly, this is part of the spec, we can not go around it, force inclusions are supposed to correctly include batches of messages without introducing bugs. Someone would have to be actively managing force inclusions to fix this, incuring other costs and going against the expected behaviour. A counter argument based on the fact that an expected and accepted functionality will never happen can not hold.

The design is not suboptimal, it is a bug, as can be proven on the other method that includes delayed messages (delayProofImpl), the code always calculates the buffer on the oldest one.

The issue is it's not just a matter of different order, the current code is incorrect and punishes users.

Think about it this way: the sequencer has been down for some time, how is it acceptable for users that they still have to wait the maximum amount of time due to this bug? According to the spec, they should be able to force include much faster, and not go through many hours of waiting.

It directly impacts the specifications for users and the ability to correctly force include batches of delayed messages, breaking protocol functionality, so it can not be a low. It does require not force including right away, so medium is appropriate.

Simao123133 commented 1 month ago

@gzeoneth the fix is actually quite simple. Just apply the buffer update on buffer.update(totalDelayedMessagesRead);, exactly like in delayProofImpl(). This will smoothly add up all sequencer delays because they are correctly capped to the time elapsed between the current and last block numbers. Check the following illustration comparing force inclusions using the latest or sequential delayed messages, which shows that the buffer is correct using this fix image

Picodes commented 1 month ago

@Simao123133 thanks. I think the main scenarios under which this happens are clear, and I feel there is a paradox in your arguments where on the one hand you consider that messages aren't force-included as soon as they can, because of gas prices or whatever reason, and that's not really an issue, and on the other hand you're saying that there is an crucial functionality that's broken if because of these delays the delay buffer isn't reduced as much as it could

Simao123133 commented 1 month ago

@Picodes thank you for the feedback. There is no paradox, maybe I can add a little bit more context. Users can enqueue delayed messages to the sequencer in the L1, adding them to the delayedInboxAccs. Consider that a few messages have been added to the delayedInboxAccs. Each message m1, m2 or m3 is only ready to be force included at some timestamp T1 < T2 < T3. This issue will happen whenever the timestamp reaches T2 and m2 is force included (which also force includes m1), and the buffer is not depleted for the time delta T2 - T1, as it completely skips T1 and stores T2 (the buffer is calculated on T2). If users force include m1 as soon as the timestamp hits T1 and then m2 once it reaches T2, this issue will not happen, as the buffer is calculated correctly on T1, T2, and so on. Thus, the only requirement for this issue to happen is not force including right away. Now, the suggested fix of the sponsor is a honest party force including messages right away. A specification of the force inclusion method is that messages can be included in batches. Thus, the fix would break this specification, as messages would no longer be force included in batches, unless users accept harming themselves and wait more time as the buffer is not correctly depleted. So, include in batches -> this bug Fix this bug by including sequentially -> break the specification that messages may be included in batches. It can be seen that either way, spec is broken, no paradox.

c4-judge commented 1 month ago

Picodes marked the issue as grade-b

Simao123133 commented 1 month ago

@Picodes kindly ask to give this another thought, I will not comment any further unless requested to. Thank you once again for your time.

There are several comments in the codebase specifically saying the buffer should be depleted and this report shows how it is possible for the buffer to not be depleted at all given a certain requirement.

There is a correct fix that is applied in the delayProofImpl() but not in forceInclusion().

forceInclusion() specifically handles batches of messages, so this issue is likely to happen. It affects the availability of users force including and as such, should be a medium. If users expect inclusions to take 30 minutes, but end up taking 12 hours, it is an availability issue.

The other low issues are either misconfigurations or unexpected flow paths, so this one should not be grouped within the same category.

[1]

Messages are expected to be delayed up to a threshold, beyond which they are unexpected and deplete a delay buffer.

[2]

Conditionally updates the buffer. Replenishes the buffer and depletes if delay is unexpected.

[3]

The delay buffer can change due to pending depletion / replenishment due to previous delays.

[4]

The delay buffer can change due to pending depletion/replenishment. This function applies pending buffer changes to proactively calculate the force inclusion deadline.

[5]

proactively apply any pending delay buffer updates before the force included message l1BlockAndTime

Picodes commented 1 month ago

@Simao123133 Thanks for your message.

This issue is to me somewhere between "function incorrect as to spec, state handling", and "availability issue".

Considering:

c4-judge commented 1 month ago

This previously downgraded issue has been upgraded by Picodes

c4-judge commented 1 month ago

Picodes marked the issue as satisfactory

c4-judge commented 1 month ago

Picodes marked the issue as selected for report

godzillaba commented 1 month ago

hi @Picodes @Simao123133

that the impact of the end-user not being able to force-include a message as soon as possible is that its funds may be locked for some time

I'm not sure this is true. In the example @Simao123133 gave with 3 messages (m1,m2,m3 at T1,T2,T3), m1 not being included right away does not cause funds to be locked since m1 is included when m2 is included at T2, and m1 could have been included at any time between T1 and T2

Picodes commented 1 month ago

@godzillaba I meant locked for some time. Funds can't be withdrawn as soon as they should because the buffer isn't depleting as fast as it should

godzillaba commented 1 month ago

if the same user (eg someone playing in an L3's bold game) submits m2 and m3, and they force include only their own messages promptly (as in the example where there is a claimed bug) they are not affected. that user, the force includer, still can only experience some max amount of sequencer censorship over a given time period

gzeoneth commented 1 month ago

Arbitrum's protocol has always been optimistic, we build most of our logic under the assumption of a honest participant exists. e.g. https://github.com/code-423n4/2024-05-arbitrum-foundation/blob/main/README.md

It is assumed ... honest validators can react to malicious action broadcasted on-chain immediately in the same block unless the malicious party spend their censorship budget.

This is unless the function is designed to be non-admin permissioned, then we enforce straighter guarantee. e.g.

There is a correct fix that is applied in the delayProofImpl() but not in forceInclusion().

Batch poster's delayProofImpl have the "correct fix" where permissionless forceInclusion do not. This is by design to limit the complexity of the contracts.

If anyone is concerned about the total delay of delayed message, it is assumed they SHOULD force include as soon as possible. If they do not call force include, there are no protocol guarantee their message will ever be included. Any delay in calling forceInclusion on parent chain can be modeled as part of the censorship budget of an attacker.

Simao123133 commented 1 month ago

The comments all over the codebase indicate that the buffer should be depleted when enough delay passes. This report proves that there are scenarios when the buffer is not depleted, but enough delay has passed. Everything else is subjective, but this statement is factual.

Moreover, honest participators may fix it, but at the cost of protocol functionality, which is not including messages in batches.

xuwinnie commented 1 month ago

Hey @Simao123133 , it seems the report doesn't illustrate how "buffer not depleted" leads to “delay of force inclusion” in detail. In your first illustration, the gap between t4 and t3 is much larger than the gap between t5 and t4. The large gap means the delay buffer has already been replenished, while the small gap means the second delayed message has not waited for enough time, which means calling force include second message probably won't succeed. (If the actual ratio is not like the illustration shows, the impact would be smaller) These are just my rough observation, but if you can describe the whole scenario more detailedly, including how exactly the value of buffer changes, exact time period that sequencer is down, how attacker behaves, and the exact impact for the innocent user, we could have a better view of the impact.

Simao123133 commented 1 month ago

@xuwinnie in the example user at T3 decided to not include the message right away and user at T4 enqueued the message some time after. Then T4 decided to include T4 right away at T5 (and T3, as it includes all messages up to T4). Hence, the delay T4 - T3 is skipped.

xuwinnie commented 1 month ago

Thanks for replying, but I still feel the explanation (and the report) doesn't have sufficient proof.

  1. It does not explain why the replenish rate, threshold, max can be ignored.
  2. It stops at delay 'T4-T3 is skipped', without further demonstrating the direct impact to the user (the two message in the illustration themselves are not affected).
  3. It does not describe the time period during which sequencer is down, which makes the report highly hypothetical. It does not prove how the force inclusion process can be delayed for a significant time in reality.
Simao123133 commented 1 month ago

I believe all parties involved have enough information to understand the issue and its impact.

Picodes commented 1 month ago

Thanks for replying, but I still feel the explanation (and the report) doesn't have sufficient proof.

  1. It does not explain why the replenish rate, threshold, max can be ignored.
  2. It stops at delay 'T4-T3 is skipped', without further demonstrating the direct impact to the user (the two message in the illustration themselves are not affected).
  3. It does not describe the time period during which sequencer is down, which makes the report highly hypothetical. It does not prove how the force inclusion process can be delayed for a significant time in reality.

The PoC could have indeed been a bit clearer but I think the impact is quite clear: following messages (not the one of the schemes) face a longer delay before being force included, and therefore the max impact is that these users can't withdraw or move their funds as soon as they could have

xuwinnie commented 1 month ago

Hi @Picodes, I made an illustration highlighting the impact of this issue. DelayBufferExplanation

If we agree ignoring replenish rate, threshold, max is acceptable (what this report implies), then I'll show this bug has zero impact.

Suppose t4 - t3 = 1 day and t5 - t4 = 0.3 day. This report implies we can successfully call force including the second message, which means t5 - t4 > newBuffer, so newBuffer < 0.3 days and new delay is 0.3 days. If we call force include the first message instead, the newBuffer will remain the same (as we force include second message), but the new delay will be 1 day. This report suggest new delay being 0.3 day instead of 1 day will harm the subsequent users who tries to force include. However, we already know the new buffer is less than 0.3 days, so in the next attempt to force include, new buffer - 0.3 days and new buffer - 1 days will both be zero. There is not any harm.

This report seems self-contradictory in this sense, since it gives up too much details.

Simao123133 commented 1 month ago

@xuwinnie there is a poc proving impact

xuwinnie commented 1 month ago

I agree, but I mean the impact your POC demonstrates (new delay being 0.3 day instead of 1 day) actually has no harm.

Simao123133 commented 1 month ago

The illustrations served the purpose of showing that the delay buffer will be incorrect, which will impact the buffer depletion and is the core of this issue.

Picodes commented 1 month ago

@xuwinnie I am not sure to see why it wouldn't impact the following user. Looking back at the coded PoC the delay is indeed different and buffer.prevBlockNumber andself.prevSequencedBlockNumber` have the same value since the last message was processed at last in both cases

xuwinnie commented 1 month ago
seqInbox.forceInclusion(
            delayedMessagesRead,
            delayedInboxKind,
            [uint64(block.number - 500), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
        )

Yes, the delay buffer will be 14400 and 13478 in these two case. But the call above succeeds, which implies next delay will be definitely larger than 14400, so there's basically no difference whether we use 14400 - 14400 or we use 13478 - 14400 in this line "buffer -= unexpectedDelay"; both result of next buffer will be threshold. Replenishment is not considered above, but we can conclude if someone calls force inclusion shortly after that(the scenario poc described) there's almost no impact. Only if nobody called force inclusion after that (the poc scenario), and replenishment took place, the two cases could have difference (14400 +2000 - 14400 and 13478 + 2000-14400 will have a different result). But I doubt whether this could be called a bug, since nobody tries to force include during these 2000 blocks, it's very likely they don't care at all. Otherwise they should have tried earlier. Why can we say 14400 + 2000 - 14400 is wrong and 13478 + 2000 - 14400 is right? They are just two different outcomes. To conclude, if a user wants to be force included asap, they should call force inclusion earlier. If the user called force inclusion promptly, they won't be affected by this issue.

Picodes commented 1 month ago

Indeed sequenced - start would be the delay of the last message included so in our case > buffer so it should saturate at threshold

c4-judge commented 1 month ago

Picodes changed the severity to QA (Quality Assurance)

Picodes commented 1 month ago

Reverting to QA following this remark and the fact that indeed the next call to forceInclude has no reason to be impacted

Simao123133 commented 1 month ago

@Picodes the comments above are not true. I built a POC with a third message to prove that the buffer will be 14400 at the end if we do not include sequentially (when it should be 13385, as the sequencer was down and there are delays). There are 3 messages, if the second one is force included, followed by the third one, the final delay buffer is 14400 (wrong). If the first message is force included, followed by the third one, the final delay buffer is 13385 (correct).

This risk is real as can be proven by the POC below.

This comment still holds.

function test_POC_InconsistentBuffer_Decrease() public {
    BufferConfig memory configBufferable = BufferConfig({
        threshold: 600, //60 * 60 * 2 / 12
        max: 14400, //24 * 60 * 60 / 12 * 2
        replenishRateInBasis: 714
    });

    (SequencerInbox seqInbox, Bridge bridge) = deployRollup(false, true, configBufferable);
    address delayedInboxSender = address(140);
    uint8 delayedInboxKind = 3;
    bytes32 messageDataHash = RAND.Bytes32();

    (uint64 bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 14400);

    vm.startPrank(dummyInbox);
    bridge.enqueueDelayedMessage(delayedInboxKind, delayedInboxSender, messageDataHash);
    vm.roll(block.number + 1100);
    bridge.enqueueDelayedMessage(delayedInboxKind, delayedInboxSender, messageDataHash);
    vm.roll(block.number + 100);
    bridge.enqueueDelayedMessage(delayedInboxKind, delayedInboxSender, messageDataHash);
    vm.stopPrank();

    vm.roll(block.number + 500);

    uint256 delayedMessagesRead = bridge.delayedMessageCount();

    // buffer is not decreased if the first and second messages are included at once
    seqInbox.forceInclusion(
            delayedMessagesRead - 1,
            delayedInboxKind,
            [uint64(block.number - 600), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
        );
    (bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 14400);

    // buffer is not decreased if the first and second messages are included at once
    seqInbox.forceInclusion(
            delayedMessagesRead,
            delayedInboxKind,
            [uint64(block.number - 500), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
        );
    (bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 14400);

    /*
    // buffer is decreased if the first message is included first
    seqInbox.forceInclusion(
            delayedMessagesRead - 2,
            delayedInboxKind,
            [uint64(block.number - 1700), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
        );
    (bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 14400);

    seqInbox.forceInclusion(
            delayedMessagesRead,
            delayedInboxKind,
            [uint64(block.number - 500), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
        );
    (bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, 13385);*/
}
Picodes commented 1 month ago

@Simao123133 can you please edit your PoC to show a force-inclusion reverting in one scenario because of the delay whereas in the other it works, which is what we want to demonstrate here

Simao123133 commented 1 month ago

@Picodes what do you mean? the POC shows the buffer is not correctly depleted in some instances, which goes against the evidence in the codebase. Users have to wait more 1000 blocks than they should. This comment is purely speculative, the fact is the codebase expects the buffer to be depleted, but this is not always the case. There is plenty of evidence saying this, as pointed out before.

Picodes commented 1 month ago

It's clear from this conversation that the question is now to know if the following users have a reasonable chance of seeing their messages delayed and their funds inaccessible for some time and that's what @xuwinnie's point is about. So in the PoC I'd like to see a force-inclusion reverting in one case and not the other, and not only a different value of bufferBlocks

Simao123133 commented 1 month ago

these 2 points still hold.

that the impact of the end-user not being able to force-include a message as soon as possible is that its funds may be locked for some time that if the sequencer is down, there may be multiple messages to force-include, and that the depletion is advertised in various places but is non-deterministic and could be "manipulated" to delay the following message, I'll upgrade to Med.

Any argument saying that the codebase sees this bug as some kind of feature for honest participators to fix is completely speculative, the evidence says the buffer should be depleted, period.

Simao123133 commented 1 month ago

So in the PoC I'd like to see a force-inclusion reverting in one case and not the other, and not only a different value of bufferBlocks

The buffer has to be more depleted for this to happen, which is much harder due to this bug. I can build a POC showing this, but I don't see any point.

Picodes commented 1 month ago

@Simao123133 I understand your frustration but please keep your comments fact-based. @xuwinnie raises a very valid doubt about this report's actual impact, which was why I switched from QA to Medium.

Simao123133 commented 1 month ago

Here is a POC showing how the buffer is correctly depleted only if messages are included sequentially. The buffer gets depleted due to actively including messages sequentially. If the second message in the loop is included directly instead, the buffer is not correctly depleted, and instead of ending up being the minimum, is a much larger value (600 vs 7320). Thus, users have to wait 2000 blocks instead of 600. It can be confirmed that initially they have to wait 2000 blocks, but in the end only 600, if we include sequentially and fix this bug. If we don't include sequentially, they still have to wait 2000 blocks.

We can play around with the numbers and observe different outcomes, but this is a real, proven risk. Here users have to wait 2000/600 = 3.3 times more to get their transaction force included. This issue will happen, how it happens depends on the conditions. The buffer being depleted is an expected scenario and the code is built to handle this situation, so arguments based on the fact that this will never happen do not stand. The impact depends on delayBlocks, threshold, max, replenishRate and user behaviour. We can find a lot of combinations showing very strong impact, as well as some showing less impact, but we know for sure this is a real risk. A list of parameters or user behavior that increase/decrease the impact can be made, but it does not erase the fact that this risk exists. It fits exactly in the definition of a medium severity issue

with a hypothetical attack path with stated assumptions, but external requirements.

function test_POC_InconsistentBuffer_Decrease() public {
    bool fix = false;
    maxTimeVariation.delayBlocks = 2000;
    BufferConfig memory configBufferable = BufferConfig({
        threshold: 600, //60 * 60 * 2 / 12
        max: 14400, //24 * 60 * 60 / 12 * 2
        replenishRateInBasis: 714
    });

    (SequencerInbox seqInbox, Bridge bridge) = deployRollup(false, true, configBufferable);
    address delayedInboxSender = address(140);
    uint8 delayedInboxKind = 3;
    bytes32 messageDataHash = RAND.Bytes32();

    for (uint i = 0; i < 7; i++) {
        vm.startPrank(dummyInbox);
        bridge.enqueueDelayedMessage(delayedInboxKind, delayedInboxSender, messageDataHash);
        vm.roll(block.number + 1100);
        bridge.enqueueDelayedMessage(delayedInboxKind, delayedInboxSender, messageDataHash);
        vm.stopPrank();
        vm.roll(block.number + 2001);
        uint256 delayedMessagesRead = bridge.delayedMessageCount();
        if (fix) {
            seqInbox.forceInclusion(
                    delayedMessagesRead - 1,
                    delayedInboxKind,
                    [uint64(block.number - 3101), uint64(block.timestamp)],
                    0,
                    delayedInboxSender,
                    messageDataHash
            );
        }
        seqInbox.forceInclusion(
                delayedMessagesRead,
                delayedInboxKind,
                [uint64(block.number - 2001), uint64(block.timestamp)],
                0,
                delayedInboxSender,
                messageDataHash
        );
    }
    (uint256 bufferBlocks, ,,,,) = seqInbox.buffer();
    assertEq(bufferBlocks, fix ? 600 : 7320);

    vm.startPrank(dummyInbox);
    bridge.enqueueDelayedMessage(delayedInboxKind, delayedInboxSender, messageDataHash);
    vm.stopPrank();
    vm.roll(block.number + 601);
    uint256 delayedMessagesRead = bridge.delayedMessageCount();

    if (!fix) vm.expectRevert(ForceIncludeBlockTooSoon.selector);
    seqInbox.forceInclusion(
            delayedMessagesRead,
            delayedInboxKind,
            [uint64(block.number - 601), uint64(block.timestamp)],
            0,
            delayedInboxSender,
            messageDataHash
    );
}
Picodes commented 1 month ago

@xuwinnie with the above PoC in a longer sequence the effect on the sequenced - start gets indeed neutralized and we're back with a DoS issue