celestiaorg / celestia-app

Celestia consensus node
https://celestiaorg.github.io/celestia-app/
Apache License 2.0
345 stars 289 forks source link

Potential DoS vector: blobs that are too big to be included in a square #3072

Open rootulp opened 9 months ago

rootulp commented 9 months ago

Problem

Originally reported via @tuxcanfly. See https://gist.github.com/tuxcanfly/207ab3dc1fac4bb9c38bedad2a053f18

Proposal

Is it possible for CheckTx to reject this type of transaction prior admittance into the mempool?

rootulp commented 9 months ago

Note we already have an ante handler that rejects txs where the total blob size is too large. It's possible that ante handler doesn't catch edge cases near the total blob size because it derives an upper bound based on a number of parameters. See

cmwaters commented 9 months ago

I think we calculate the space taken by the blobs incorrectly https://github.com/celestiaorg/celestia-app/blob/2d261a2a0f7f627044342929b534a16b762c47e0/x/blob/ante/max_total_blob_size_ante.go#L74-L79

it should be in regards of the amount of shares * 512 bytes i.e.

for _, size := range sizes {
    sum += uint64(shares.SparseSharesNeeded(size)) * appconsts.ShareSize
}

Also I would consider of adding an extra share or two as buffer for padding

rootulp commented 9 months ago

getTotal is only used here where we want the number of total bytes in the blob sizes.

Also I would consider of adding an extra share or two as buffer for padding

Agreed. That ante handler doesn't account for padding shares so this is related to https://github.com/celestiaorg/celestia-app/issues/3081

cmwaters commented 9 months ago

getTotal is only used here where we want the number of total bytes in the blob sizes.

yes but we want the total bytes based on the amount of shares that those blobs will occupy; not the raw bytes of the blobs

rootulp commented 9 months ago

🤦‍♂️ you're right, good catch!

rootulp commented 8 months ago

This issue may be partially mitigated by the fix in #3082 which will be released in the next major release.

evan-forbes commented 8 months ago

@rootulp since this is a bug, is it okay if we treat is as such? and is this still an open issue given #3082?

rootulp commented 8 months ago

TBH I still think this is open. #3082 improves the ante handler to reject PFBs that are guaranteed to not fit in a data square. But I think the DoS vector still exists and a malicious actor could craft PFBs that pass the ante handler but fail to fit in a data square.

Agreed it is a bug. What do you mean by "treat it as such"?

evan-forbes commented 8 months ago

Agreed it is a bug. What do you mean by "treat it as such"?

mainly that we just add a label lol and discuss what is needed to fix it ofc. that is weird wording tho, apologies

do we need more tests than what were added in #3082? what do we need to do to be confident that this isn't occuring? I could be missing something, but we should be able to deterministically test for this, correct?

rootulp commented 8 months ago

what do we need to do to be confident that this isn't occuring?

One more idea to mitigate the DoS: modify BroadcastTx from blocking to non-blocking

do we need more tests

We could use a fuzz test that submits blobs to the ante handler and verifies that if a valid PFB tx passes the ante handler then it also must be able to be included in an empty block (i.e. a block where it is the only tx).

evan-forbes commented 8 months ago

One more idea to mitigate the DoS: modify BroadcastTx from blocking to non-blocking

does the signer in the user packager using BroadcastMode Sync count for non-blocking? In general I agree that we should get rid of broadmost mode block, and perhaps a stupic question, but how does this reduce DoS? just that it reduces the resources required per connection? If so, we do have configurable limits on the time each node will wait and how many active connections it maintains help?

We could use a fuzz test that submits blobs to the ante handler and verifies that if a valid PFB tx passes the ante handler then it also must be able to be included in an empty block (i.e. a block where it is the only tx).

adding a fuzzer is a good idea! for this specific issue, perhaps we could limit to the scope to just the size of the PFB and the problematic antehandler?

evan-forbes commented 8 months ago

to close issue, we need to add a test case for the boundary of max blob size in this test https://github.com/celestiaorg/celestia-app/blob/a93bb625c6dc0ae6c7c357e9991815a68ab33c79/x/blob/ante/blob_share_decorator_test.go#L21