gnolang / tm2-js-client

The Tendermint2 JS/TS client (SDK)
https://gno.land
Apache License 2.0
10 stars 4 forks source link

Error handling in the sendTransaction function #46

Closed jinoosss closed 1 year ago

jinoosss commented 1 year ago

Currently, providers only validate for success/failure of calls in general. However, when we call broadcast_tx of the sendTransaction function, we can get an error about the transaction as a response. This error cannot be checked in the block, so I think error handling is necessary.

@zivkovicmilos I have a few questions.

  1. Is it intended that when a transaction error occurs, a hash is the response?
  2. Do you think tm2-js-client should handle gno's errors? (If we use the gno response as is, the error types need to be checked in the gno repository and handled) ** gno errors: https://github.com/gnolang/gno/blob/0b38c8d793f83d50929ca27351f2b3cb89e9b974/tm2/pkg/std/errors.go
zivkovicmilos commented 1 year ago

Hey @jinoosss,

Thank you for opening up the issue, these are completely valid questions 🙏

I definitely believe the transaction send method definitely should alert of any errors that occurred during broadcast. This is probably an oversight on my part when I was implementing it, as the hash information is the only one you'd need for the happy path of sending a transaction (and obvious axios errors are handled by the RestService)

As for the error handling, I'm not sure there are any Gno-specific errors, all of these seem to be TM2 related. Am I wrong here? In any case, the tm2 package should handle tm2 errors, and the gno package should be able to handle gno-specific errors

I will look into adding this functionality soon, or we can coordinate together if you'd like to take a go at it 💪

jinoosss commented 1 year ago

Thanks for the comment.

I'd be pleased to coordinate with you. And I agree with your comments.

Do you have a way of handling TM2 errors in mind?

I personally thought about throwing an error (based on the TM2 response error type). What do you think about this idea?

zivkovicmilos commented 1 year ago

Hey @jinoosss,

Sorry for the somewhat late reply on this issue. I like the idea of throwing an error, I think it's one of the most cleanest ones and doesn't change the API drastically

I need to look into how these errors are returned to clients, as I suspect they're being encoded in amino (we might need to proto them out into a JS type). I'll open a draft PR and add you to get your opinion on changes 🙏

zivkovicmilos commented 1 year ago

@jinoosss,

Just verified it locally and have more information on this issue.

The problem lies with how broadcast_tx_sync (and broadcast_tx_async) work. There are essentially 2 types of checks being done on a transaction, depending on the context (how it's sent, and at which point in the tx pipeline is the tx):

Simulation checks

These types of transaction checks actually run the message associated with the transaction. What does this mean? If Alice sends Bob 10GNOT, it's in the form of a MsgSend, that is associated with the transaction. When a transaction is being simulated, the message (or rather, the relevant modules) are being executed to verify that it doesn't error out. In the case of regular fund transfers like with Alice and Bob, the banker would verify that there is enough funds on Alice's account before the transaction is a success

Basic TX verification

This type of transaction check simply verifies that the transaction is valid enough to enter the transaction pool (mempool), and be picked up later by the block building process. These checks include, and are not limited to:

You will notice that this type of check does not execute the actual transaction messages (ie. fund transfers).

The problem

This sort of brings us to the problem at hand: broadcast_tx_async and broadcast_tx_sync only execute the basic tx verification before giving a response to the user on the HTTP / WS endpoint (client). In our example, if Alice sent this transaction for transferring funds to Bob (without having the funds), she would get a success response because only basic verification was done on her transaction, instead of the more thorough simulation. This basic tx verification is denoted as CheckTX with the RunTxModeCheck mode.

broadcast_tx_commit waits for the transaction to be committed to a block before returning the response to the client. At this point, the transaction is actually simulated, and it would error out potentially if the tx messages are not valid. This is the least favorable endpoint to use for clients, because it blocks the client response until a transaction (if ever) is committed to a block in the chain.

tm2-js-client relies on broadcast_tx_sync when sending out transactions, so it could never catch this sort of post-process check (message execution).

What can we do?

The way I see it, there are 2 action points:

  1. Throw an error in sendTransaction if the BroadcastTxResult.error is not nil - this is the situation where the transaction is just not valid at a base level (invalid signatures, insufficient fees, invalid tx fees...) (these are tm2-js-client changes)
  2. Support transaction receipts (or some notion of it) on gno - this is completely separate from the clients, and requires changes on the core gno codebase, where you'd expose an endpoint similar to Ethereum's eth_getTransactionReceipt

I need to sync with the core team to figure out how to tackle the second point, as it's a much bigger change than the first one. I will open up a small PR that adds support for the action point at number 1, as it's super simple to implement

zivkovicmilos commented 1 year ago

I've opened up a small PR to address point 1:

https://github.com/gnolang/tm2-js-client/pull/59

Please let me know what you think 🙏

jinoosss commented 1 year ago

@zivkovicmilos,
Thanks for identifying the problem 🙏

I think it's a good idea to throw a clear error so that it can be used without knowing the core of the chain. I think it would be more useful to make it so that you only need to know tm2-js-client to handle the error.

I'll take a closer look at the PR you suggested and offer further comments.

jaekwon commented 1 year ago

BTW please take a look at the comments in tm2/pkg/bft/rpc/core/mempool.go for details of response format for broadcasttx* calls.

broadcast_tx_commit waits for the transaction to be committed to a block before returning the response to the client. At this point, the transaction is actually simulated, and it would error out potentially if the tx messages are not valid.

It runs the tx in a cachewrapped context so that all the changes can be discarded if there is an error during execution (delivertx). But for clarity I think we should not call this "simulation" since it does write if the tx succeeds, and also baseapp.Simulate() is an existing function which works similarly, except it always discards the result.

(if we wanted, we could consider exposing a "simulate" RPC endpoint, but I think that's outside the scope of this issue, and also we'd have to figure out how to throttle it somehow.)

This is the least favorable endpoint to use for clients, because it blocks the client response until a transaction (if ever) is committed to a block in the chain.

broadcast_tx_commit isn't the best for production usage because it runs a redundant checktx step. That is, it first runs checktx, and if that succeeds, then runs delivertx, but that means the ante handler is run twice.

(I think we could just modify broadcast_tx_commit to make the checktx step optional, if I'm not mistaken. Also we can take a look at tm1 or comet to see what they do.)

But in order to get the actual result value, of course we do have to wait until the block commits (since the result can depend on prior txs). So I'm not sure it's "unfavorable" unless you have a series of txs to commit at once.

If you don't want to make a blocking rpc request like broadcast_tx_commit, then as per the comments in mempool.go you can listen to the websocket for events. If it is not acceptable to lose results upon websocket failure (since reconnecting the websocket doesn't replay old events) then we should use the event store system which is specifically for replaying the same events as the websocket (and the external indexer should use this because indexing requires impotency and thus requires event replay from last height known by the indexer).

(do we have a PR for that? I remember discussing this months ago but don't recall seeing the code for it)

Support transaction receipts (or some notion of it) on gno

The transaction receipt is already passed to tm2 as follows:

  1. sdk.Result from handler.Process (from delivertx, which includes the ante handler error if any)
  2. abci.ResponseDeliverTx (wire format from abci-app to tm2)
  3. bft.ABCIResult (tm2 stores the results from ABCI)

(See also https://github.com/gnolang/gno/issues/968 first)

All deterministic receipt information should go in all three structures... for example, tx events from the app (in the case of gno.land, these are events published via gno function calls like std.Emit() perhaps.

you'd expose an endpoint similar to Ethereum's eth_getTransactionReceipt

The results are available from the "block_results" ABCI call, but this isn't the best way to get the result of a tx based on its hash. The right rpc endpoint for that is "tx", but it is currently commented out (see tm2/pkg/bft/rpc/core/tx.go) because the tx_indexer logic was removed.

Has there been any work on the replayable event store and external tx indexer? Maybe the external tx index is what should ultimately serve results from tx hashes. This rpc endpoint would arguably get hammered by user clients, so it might make sense to offload that into the external tx indexer which can solve for throughput, rather than requiring a cluster of tm2 nodes to do the same.

zivkovicmilos commented 1 year ago

@jaekwon

Has there been any work on the replayable event store and external tx indexer? Maybe the external tx index is what should ultimately serve results from tx hashes. This rpc endpoint would arguably get hammered by user clients, so it might make sense to offload that into the external tx indexer which can solve for throughput, rather than requiring a cluster of tm2 nodes to do the same.

Based on our discussions on https://github.com/gnolang/gno/pull/546, there is work being done on two fronts: