tx method: better handling of cached transactions that can never be validated (Version: v1.7.0-b10)

Summary

When you use the tx method on a transaction that's been cached but does not appear in any ledger, the response has a special case format that is undocumented and unhelpful.

We should add the searched_all response field to this case if you provided the min_ledger and max_ledger fields (added in 1.5.0), or remove this case so that it results in a txnNotFound error response that already has searched_all. There are some other improvements that could be helpful as well, but that's the low-hanging fruit.

Motivation

When looking up a transaction, you really would like to be able to determine its final outcome. That means separating the following three cases:

The transaction is found in a validated ledger with immutable results.
The transaction is not in a validated ledger but may become validated in the future.
The transaction is not in any validated ledger and never will be.

The difference between cases (2) and (3) can be difficult to ascertain. Generally speaking, you need to determine the lower and upper bounds for what ledger the transaction is in, then you need to confirm that you have checked all of those ledgers and the transaction is in none of them.

The min_ledger and max_ledger fields help you confirm that all the ledgers were checked if you got a txnNotFound response, but they do not help if the transaction was found, but not in any known ledger. In that case, tx returns a result such as the following:

{
  "id": 1,
  "result": {
    "Account": "rKBi32xVtepHSbtG8SciPZR7NyH7JtuUkK",
    "Amount": "10000",
    "Destination": "rPT1Sjq2YGrBMTttX4GZHjKu9dyfzbpAYe",
    "Fee": "12",
    "Flags": 2147483648,
    "LastLedgerSequence": 14069654,
    "Sequence": 14069567,
    "SigningPubKey": "038F25A3875580D145179B81CE738BAE553AF6792A33628869E8F88180C307903B",
    "TransactionType": "Payment",
    "TxnSignature": "3045022100D9BEA5BD84244FD4EEAE23BFAE111D7BD8F5DE8CA30F47E9BA0D8385A00B5C4A022011AB58E7C4C7885A5417FCC8BABBF4A9F832DF63FA2E444E323D975BE5DBAB59",
    "hash": "17C3E09F3D90D311CE4944D2BF42E8ACFC0C6205147487F2EB2BD024ECDF6F28",
    "validated": false
  },
  "status": "success",
  "type": "response"
}

Note, I believe this is a different format than the case where the transaction is already in the open ledger and is likely to be validated by consensus soon. This type of response also omits several fields (ledger_index, meta, etc.) that are currently documented as always being present in the response. Related: XRPLF/clio#722.

This response does not provide enough information to know whether the transaction has succeeded or failed, because we don't know if server has all the ledgers from the min_ledger through the max_ledger specified in the request. (See "How to Reproduce" below for details of how to re-create and test this case.)

Solution

If the response in this case (tx found, not validated) included a searched_all field similar to in the txnNotFound response, then the client could use that information to conclusively determine that the transaction will never be validated in the case where validated is false and searched_all is true. (In the case where searched_all is false, the client has to either wait or ask another server depending on why some of the ledgers weren't searched.) This is a non-breaking API change that would vastly improve the usefulness of this response case.

Alternatively, I think it would be OK to remove this "cached but not in any ledger" case and return txnNotFound if the transaction is in the cache but not in any known ledger. I'm not sure if that counts as a breaking API change; in a sense it could be considered a bug fix to remove an anomalous case that doesn't match the documented response format.

It's not possible to add the date and ledger_index fields to this response type because those fields are inapplicable to transactions in this state. So, unless we remove this case and return txnNotFound, documentation changes are also necessary.

How to Reproduce

This case can be reproduced in stand-alone mode or on an actual network such as Testnet, in several ways. Two straightforward ways to ensure that a transaction is never validated are:

Option A: Future sequence
1. Look up & note the latest validated ledger index (use this as min_ledger later)
2. Start creating a transaction (e.g. AccountSet) with its LastLedgerSequence a few ledgers ahead (note the value and use it as the max_ledger later)
3. In the same transaction, set the Sequence value to higher than the sender's current sequence number (e.g. current sequence + 50)
4. Sign and submit the transaction (expected result: terPRE_SEQ)
5. Don't send any other transactions until the LastLedgerSequence has passed. (In stand-alone mode, use ledger_accept to close ledgers; on a network, just wait.)
6. Use tx to look up the transaction by hash with min_ledger and max_ledger as noted.
Option B: Delayed submit.
1. Look up & note the latest validated ledger index (use this as min_ledger later)
2. Start creating a transaction (e.g. AccountSet) with its LastLedgerSequence a few ledgers ahead (note the value and use it as the max_ledger later)
3. Sign the transaction but don't submit it yet!
4. Wait until the LastLedgerSequence has passed. (In stand-alone mode, use ledger_accept to close ledgers; on a network, just wait.)
5. Submit the transaction. (expected result: tefMAX_LEDGER)
6. Use tx to look up the transaction by hash with min_ledger and max_ledger as noted.

In both cases, the transaction is eventually evicted from cache and the result is txnNotFound instead, but shortly after submission (the time when you're most likely to want to know a transaction's result) the tx result is the ambiguous case in the example above.

Paths Not Taken

More Aggressive Eviction of Doomed Transactions

You may notice that in the above examples, rippled is caching and returning a transaction that can never succeed. Most likely this is because the server does not know that. There are similar cases where it might be possible for the transaction to eventually become validated (e.g. if you used a future sequence and then submitted the transactions that use the sequence numbers in between). Presumably the thought process behind the current behavior was that it's better to provide as much information about the transaction as is available, but the lack of searched_all means that finding the cached-but-not-applied transaction is now actually less useful than not finding it.

The server could be better about dropping a transaction from cache when that transaction can never succeed; the hard part there is setting the lower bound for ledgers that could contain the transaction; but there are a few ways you could do this:

Throw out any cached transaction with a LastLedgerSequence less than the current validated ledger regardless of whether it could potentially, theoretically have been in a previous ledger. I think the problem with this approach is you don't necessarily know how the transaction got into the cache; it might be part of fetching a historical ledger or something, in which case you want to keep it there until you can add it to that ledger—I'm not an expert on how the cache works, so I don't know for sure if that's a thing that can happen.
Recognize when you have full, uninterrupted history and thus any transaction with a past LastLedgerSequence that's not in a ledger already can be thrown out. (Note the difference between a past validated ledger—results final—and a past closed ledger—which might get replaced.)
Triangulate the earliest ledger a transaction could occur in based on its Sequence (or TicketSequence) number, if you know in what ledger the sender's previous sequence number was used. For example, if an account used Sequence value S in ledger L, then it's completely impossible for Sequence S+1 or higher to be in ledger L-1 or lower. (Be mindful: Sequence S+1 could be in the same ledger L.)
- Tickets are slightly trickier, but the same logic applies. TicketSequence S+1 could not have possibly been confirmed before the ticket was created, which means that if an account used Sequence S is in ledger L, the ticket with TicketSequence S+1 could not exist in ledger L-1 or lower, and thus a transaction using TicketSequence S+1 cannot exist in ledger L-1 or lower.

While all this might be a useful optimization, cache management is hard, and it seems less comprehensive than just making the response more useful in the above case.

FirstLedgerSequence

A FirstLedgerSequence transaction field would put a hard lower bound on what ledger can include the transaction in the same way that LastLedgerSequence puts a hard upper bound. This has been proposed before but has not yet been implemented. I still think it's a good idea, but it would require an amendment and it doesn't fix the immediate problem with the anomalous tx response.

FirstLedgerSequence would be useful for determining the final result of a transaction, since the transaction instructions themselves become a self-contained definition of what the min_ledger/max_ledger bounds are. If a transaction has both fields, a server with the given range of ledgers on-hand could definitively say if such a transaction will never be validated. This could also help the server to decide sooner to evict these transactions from cache.

XRPLF / rippled