Exhausted speculative trx not retried

AntelopeIO / leap

C++ implementation of the Antelope protocol

Other

116 stars 68 forks source link

Exhausted speculative trx not retried #1247

Closed heifner closed 1 year ago

heifner commented 1 year ago

When running in non-default --read-mode speculative, an exhausted trx is not retried until a block is received. A test failure where no new blocks arrived illustrated this: https://github.com/AntelopeIO/leap/actions/runs/5202501518

This condition should not be hit for read-mode=head as an exhausted trx is an exhausted block as the billed cpu/net are reset at the end of each trx.

Currently we have:

                                       if (!self->process_incoming_transaction_async(result, api_trx, return_failure_traces, next)) {
                                          if (self->_pending_block_mode == pending_block_mode::producing) {
                                             self->schedule_maybe_produce_block(true);
                                          } else {
                                             self->restart_speculative_block();
                                          }
                                       }

Where process_incoming_transaction_async returns false for an exhausted block. Instead process_incoming_transaction_async should check self->_pending_block_mode == pending_block_mode::producing and return false if exhausted block and producing but return false if not producing and either trx is exhausted or block is exhausted.

greg7mdp commented 1 year ago

@Kevin, in the last sentence of the description above, you wrote should return false for both cases. which one is wrong?

Also, in the 4.0 release notes I see:

read-mode=speculative remains supported for users who want old behavior but with an important change that transactions submitted via API endpoints will not be reapplied speculatively upon a new block being applied. The user is required to resubmit any transactions after a new block should those transactions' side effects need to be observed.

So should we still retry exhausted transactions in speculative move? When? Why would it be more likely to succeed when we retry it?

heifner commented 1 year ago

The retry talked about in the release notes is the api-persisted-trx option. See https://github.com/AntelopeIO/leap/blob/v3.2.3/plugins/producer_plugin/producer_plugin.cpp#L788 and persist_until_expired implementation in producer_plugin. This was a feature of API RPC trxs that they would be re-applied at the start of each block until they expired or were included in a block from a block producer. This allowed, for example, a user to submit a create account trx followed by a transfer to that account; this would work as long as the trxs were processed in order of sending (which is not guaranteed).

The retry talked about in this issue is the retry that happens for trxs when they do not fit into a block. When near a block boundary (CPU or NET), if the transaction exceeds the block limit it is retried in the next block. It will be more likely to succeed because it is likely to not hit the block boundary again.

Fun fact: if you run with a max-transaction-time set to larger than block time (500ms) and the transaction runs longer than block time (500ms), it will be retried over and over until it expires.

greg7mdp commented 1 year ago

Thanks a lot Kevin, this helps a lot!

What about the first question: in the last sentence of the description above, you wrote should return false for both cases. which one is wrong?

heifner commented 1 year ago

If _pending_block_mode != pending_block_mode::producing then an exhausted trx should be seen as an exhausted block since you want to retry the transaction. This should not happen when in head mode, but doesn't hurt to have the same logic no matter the mode. The key thing is to treat an exhausted trx as an exhausted block when not producing. When producing you want to fit as many trx in a block that you can. If you are not producing, then the block is only a side-effect of how trx processing is implemented; just restart the speculative block.