celestiaorg / celestia-core

A fork of CometBFT
Apache License 2.0
488 stars 267 forks source link

Propagate an error message when a tx already exists in the mempool #964

Open rootulp opened 1 year ago

rootulp commented 1 year ago

Context

https://github.com/rollkit/rollkit/issues/725

Problem

When clients attempt to PayForData in celestia-app, celestia-core doesn't return a full error message when a tx already exists in the mempool. The celestia-appd binary will log:

6:26PM ERR Error on broadcastTxCommit err="tx already exists in cache" module=rpc

but celestia-core returns to celestia-app something like

[rootulp] txResp: tx_response:<txhash:"7AE1E7393D312FFA9E9965DD5958FE83E0A741A0F3345B68A2F7F6039EECB396" codespace:"sdk" code:19 > , err <nil>

[rootulp] core_access.go SubmitPayForData response: code: 19
codespace: sdk
data: ""
events: []
gas_used: "0"
gas_wanted: "0"
height: "0"
info: ""
logs: []
raw_log: ""
timestamp: ""
tx: null
txhash: 7AE1E7393D312FFA9E9965DD5958FE83E0A741A0F3345B68A2F7F6039EECB396
, err: <nil>

which means celestia-app can't forward a full error message to celestia-node or rollkit.

Proposal

Modify the txResponse to include a descriptive error message or return an error. Prior to making any changes, investigate if the relevant code path has been modified from vanilla tendermint/tendermint.

evan-forbes commented 1 year ago

responding with more human readable errors is definitely something that would be nice, and perhaps we can do something further with the cli or the sdk, but we are returning the error code 19

https://github.com/celestiaorg/cosmos-sdk/blob/0aef7ba8d215a3978de8c298dc40d9fa65041695/types/errors/errors.go#L95

apologies if this is already being done, but are we checking the abci code after submitting a tx? I couldn't see this step in the referenced issue

rootulp commented 1 year ago

but are we checking the abci code after submitting a tx

Which component do you expect to check the ABCI code? For other types of transaction errors, an appropriate error message is bubbled up all the way up to Ethermint. For example:

7:13PM ERR DA layer submission failed error="Codespace: 'sdk', Code: 4, Message: signature verification failed; please verify account number (0) and chain-id (): unauthorized" attempt=9 module=BlockManager
6:45PM ERR DA layer submission failed error="Codespace: 'sdk', Code: 13, Message: insufficient fees; got:  required: 6000utia: insufficient fee" attempt=1 module=BlockManager

so I'm concerned the "tx already exists in mempool" error is handled differently in celestia-core which leads to a lack of error message.

evan-forbes commented 1 year ago

Which component do you expect to check the ABCI code?

if its json, then the code field, but if grpc, then we should get https://github.com/celestiaorg/celestia-core/blob/07b8dbcb153563db8f2429b94a632b23d2621519/proto/tendermint/abci/types.proto#L241

an example would be https://github.com/celestiaorg/celestia-app/blob/a113f5ac8ad5f336f419d47cf55e0510df550490/app/test/integration_test.go#L167-L169

so I'm concerned the "tx already exists in mempool" error is handled differently in celestia-core which leads to a lack of error message

I see, that makes sense

evan-forbes commented 1 year ago

I'm assuming that at least the code is returned back because of

[rootulp] core_access.go SubmitPayForData response: code: 19

rootulp commented 1 year ago

Yep the Cosmos SDK error code is returned in the tx response so it is possible to wrap the tx response with an error message in celestia-app somewhere after this line (i.e. with a switch statement using the Cosmos SDK errors here) but that seems less ideal than having celestia-core return the error message along with the error code.

evan-forbes commented 1 year ago

we might want to upstream this, and it sounds like this is only due this specific error in the mempool

cmwaters commented 1 year ago

I guess this isn't a problem if you use the Tendermint RPC to call BroadcastTx?

I think the culprit is here: https://github.com/celestiaorg/cosmos-sdk/blob/0aef7ba8d215a3978de8c298dc40d9fa65041695/client/broadcast.go#L111

I'm not sure why you would want to extract out the error returned by tendermint and put it in the TxResponse instead of just returning the error as is.

I guess here: https://github.com/celestiaorg/cosmos-sdk/blob/0aef7ba8d215a3978de8c298dc40d9fa65041695/client/broadcast.go#L57 we should add the actual error message to the RawLog field in TxResponse so at least it's more human readable.

Note: the current approach, as the TODO implies, is quite brittle. With the CAT mempool, a different error message was used to indicate that the tx was already in the mempool. This won't get caught in the switch statement and thus I believe it will be returned as a regular error.