anoma / namada

Rust implementation of Namada, a Proof-of-Stake L1 for interchain asset-agnostic privacy
https://namada.net
GNU General Public License v3.0
2.4k stars 960 forks source link

`namada tx-result` times out on old tx hashes #3931

Open dan-u410 opened 1 month ago

dan-u410 commented 1 month ago

Bug: namada tx-result never returns information about the transaction. It will timeout

to reproduce:

  1. submit any tx on chain
  2. trying to query those results will timeout
  3. but the tx hasn't actually timed out as you can still see the side effects of transfers/bonds on-chain
$ namadac tx-result --tx-hash <tx_hash> \
 --node <node>

Checking if tx <tx_hash> is applied...
Transaction status query deadline of Instant { tv_sec: 4456, tv_nsec: 894543958 } exceeded
Timed out waiting for tx to be applied

Further Confusion but not as critical as above ^:

  1. the output of a transparent-transfer tx provides 2 transactions:
    $ namadac transparent-transfer ...
    ...
    Transaction added to mempool.
    Transaction hash: 69EA8EA284050FF494328AC5CE1AF8070129F5F2E9E8E0A8F1392804AE53D3FC
    Transaction 4125FD5375FE9B565A7F6A6B39E2B8EBD840A5665FEDC7F54F46D4F5006F70D5 was successfully applied at height 23120, consuming 35626 gas units.
  2. im guessing that one is the fee wrapper & one is the tx itself but no information is displayed for querying either, resulting in confusion:
    
    $ namadac tx-result <first tx>
    ...
    Checking if tx 69EA8EA284050FF494328AC5CE1AF8070129F5F2E9E8E0A8F1392804AE53D3FC is applied...
    Transaction 4125FD5375FE9B565A7F6A6B39E2B8EBD840A5665FEDC7F54F46D4F5006F70D5 was successfully applied at height 23120, consuming 35626 gas units.

$ namadac tx-result ... Checking if tx is applied... Transaction status query deadline of Instant { tv_sec: 4456, tv_nsec: 894543958 } exceeded Timed out waiting for tx to be applied

sug0 commented 1 month ago

Unfortunately, this has to do with the current implementation of checking the status of a tx. RPC queries are made to a Namada-specific event log, which is not persisted to storage, and whose capacity is limited to ~50k events (of the same type, e.g. tx/applied). Its events are replicated and persisted to CometBFT's block storage, which means we could have retrieved them from there, with some effort. Essentially, we'd need to crawl the chain backwards until we found an event associated with the given tx hash (if any). This is laborious and it has the downside of spamming RPC requests.

dan-u410 commented 1 month ago

@sug0 im seeing timeouts with tx's I submitted right before querying - is that expected?

grarco commented 1 month ago

@sug0 just noticed that the last hex char of the inner tx hash differs between when he submits the tx and when he gets the result of the query back

dan-u410 commented 1 month ago

hi @grarco apologies there, I must have fat fingered something when copying/pasting.

Can confirm that the tx hashes are matching on submission/output

dan-u410 commented 1 month ago

to be precise: the fat finger was only here in the github issue - I still see timeouts when querying recently submitted transactions cc @sug0

sug0 commented 1 month ago

to be precise: the fat finger was only here in the github issue - I still see timeouts when querying recently submitted transactions cc @sug0

ah. regarding this output:

$ namadac transparent-transfer ...
...
Transaction added to mempool.
Transaction hash: 69EA8EA284050FF494328AC5CE1AF8070129F5F2E9E8E0A8F1392804AE53D3FC
Transaction 4125FD5375FE9B565A7F6A6B39E2B8EBD840A5665FEDC7F54F46D4F5006F70D5 was successfully applied at height 23120, consuming 35626 gas units.

namada supports batches of transactions. this means that a single tx can hold many inner txs (as many as the protocol limit for the size of a single batch). only the first hash ($1 = 69EA8EA284050FF494328AC5CE1AF8070129F5F2E9E8E0A8F1392804AE53D3FC) represents the batch. the second hash you see ($2 = 4125FD5375FE9B565A7F6A6B39E2B8EBD840A5665FEDC7F54F46D4F5006F70D5) is one of the inner txs. for tracking purposes, only $1 is valid.

in essence, your tx-result query is timing out because $2 is not a valid hash for a batch tx.


EDIT: the issue regarding tx-result timing out for txs that have been purged from the event log still holds

dan-u410 commented 1 month ago

okay noted! Ill look into using an indexer for my queries, thanks for the discussion

sug0 commented 3 weeks ago

I think it's fine to leave this issue open. We can def improve the behavior of tx-result, even if it's just by emitting an error that mentions the tx event might have been evicted from the queue of events.