XRPLF / rippled

Decentralized cryptocurrency blockchain daemon implementing the XRP Ledger protocol in C++
https://xrpl.org
ISC License
4.53k stars 1.47k forks source link

canonicalizing txn pointer in NetworkOPsImp::processTransaction leads to race condition/incorrect txn result (Version: 2.1.0) #4957

Open ubuddemeier opened 8 months ago

ubuddemeier commented 8 months ago

Issue Description

The call to canonicalize here: https://github.com/XRPLF/rippled/blob/master/src/ripple/app/misc/NetworkOPs.cpp#L1232 can lead to a race condition, if a transaction is processed synchronously (i.e. "local"), while the same transaction is processed again asynchronously (e.g. received from a peer). See the following log excerpt pertaining to a particular transaction. I added additional logging not present in repo, ID1 stands for the txn ID. Note how the transaction result changes while doSubmit is still busy converting the results. This ultimately leads to an incorrect result returned by doSubmit.

2024-Mar-19 15:28:02.615291761 UTC RPCHandler:NFO doSubmit calling_processTransaction TX ID1 ptr=0x7f1bc017a320 2024-Mar-19 15:28:02.615395205 UTC NetworkOPs:NFO NetworkOPsImp::processTransaction before_canonicalize TX ID1 ptr=0x7f1bc017a320 2024-Mar-19 15:28:02.616145354 UTC NetworkOPs:NFO NetworkOPsImp::processTransaction after_canonicalize TX ID1 ptr=0x7f1bc017a320 2024-Mar-19 15:28:02.616840113 UTC NetworkOPs:NFO NetworkOPsImp::doTransactionSync setApplying TX ID1 ptr=0x7f1bc017a320 2024-Mar-19 15:28:02.617390617 UTC NetworkOPs:NFO NetworkOPsImp::apply TX ID1 result=tesSUCCESS ptr=0x7f1bc017a320 local=true 2024-Mar-19 15:28:02.617481900 UTC NetworkOPs:NFO NetworkOPsImp::apply clearApplying TX ID1 ptr=0x7f1bc017a320 local=true 2024-Mar-19 15:28:02.638734806 UTC Protocol:NFO [003] PeerImp::handleTransaction TX ID1 2024-Mar-19 15:28:02.641427701 UTC NetworkOPs:NFO NetworkOPsImp::processTransaction before_canonicalize TX ID1 ptr=0x7f1bc80c06d0 2024-Mar-19 15:28:02.641445326 UTC NetworkOPs:NFO NetworkOPsImp::processTransaction after_canonicalize TX ID1 ptr=0x7f1bc017a320 2024-Mar-19 15:28:02.641599727 UTC NetworkOPs:NFO NetworkOPsImp::doTransactionAsync setApplying TX ID1 ptr=0x7f1bc017a320 2024-Mar-19 15:28:02.641772745 UTC NetworkOPs:NFO NetworkOPsImp::apply TX ID1 result=tefPAST_SEQ ptr=0x7f1bc017a320 local=false 2024-Mar-19 15:28:02.642194166 UTC NetworkOPs:NFO NetworkOPsImp::apply clearApplying TX ID1 ptr=0x7f1bc017a320 local=false 2024-Mar-19 15:28:02.638380244 UTC RPCHandler:NFO doSubmit returned_from_processTransaction TX ID1 res=tesSUCCESS ptr=0x7f1bc017a320 2024-Mar-19 15:28:02.643990745 UTC RPCHandler:NFO doSubmit done_with TX ID1 res=tefPAST_SEQ ptr=0x7f1bc017a320

Steps to Reproduce

Python script to generate requests:

import hashlib, time, random, asyncio from xrpl.wallet import Wallet from xrpl.models.transactions import Payment from xrpl.models.response import ResponseStatus from xrpl.transaction import sign from xrpl.asyncio.transaction import submit from xrpl.asyncio.ledger import get_latest_validated_ledger_sequence from xrpl.asyncio.account import get_next_valid_seq_number, get_balance from xrpl.asyncio.clients import AsyncWebsocketClient from xrpl.core.addresscodec import encode_seed from xrpl import CryptoAlgorithm from xrpl.asyncio.clients.exceptions import XRPLRequestFailureException

Note: edit xrpl/asyncio/clients/websocket_base.py as follows:

self._websocket = await websocket_client.connect(self.url, extra_headers={"X-User:" : "anon"})

url = "ws://IP_ADDRESS_GOES_HERE:60516" fee = 10 nWallets = 100 nTasks = 0 targetThroughput = 0 runTime = 60 fundAmount = 1000000000

rnd = random.Random(1)

def seed_from_pass(passphrase): return encode_seed(hashlib.sha512(passphrase).digest()[:16], CryptoAlgorithm.SECP256K1)

def wallet_from_seed(passphrase): return Wallet.from_seed(seed_from_pass(passphrase), algorithm=CryptoAlgorithm.SECP256K1)

nTxns = 0 tStart = 0 async def generate_load(client2, wallets): global nTxns, rnd nWallets = len(wallets) if nWallets == 0: return async with AsyncWebsocketClient(url) as client:
while (t := time.perf_counter()) < tStart + runTime: nTxns += 1

rate control

  if targetThroughput > 0:
    tTarget = tStart + nTxns / targetThroughput
    if tTarget > tStart + runTime:
      nTxns -= 1
      break
    if (t < tTarget):
      await asyncio.sleep(tTarget - t)
  # generate the txn
  wRecv = rnd.choice(wallets)
  iSend = rnd.randrange(nWallets)
  iOffset = 0
  while True:
    wSend = wallets[(iSend + iOffset) % nWallets]
    if not wSend.busy and wSend.address != wRecv.address:
      break
    iOffset += 1
    if iOffset == nWallets:
      iOffset = 0
      await asyncio.sleep(0)  # all wallets are busy, need to wait for one to become available
  tx = sign(Payment(account=wSend.address, amount="1", destination=wRecv.address, sequence=wSend.seq, fee=str(fee)), wSend)
  # dispatch the txn
  wSend.busy = True
  response = await submit(tx, client)
  failed = True
  if response.status == ResponseStatus.SUCCESS:
    if response.result['engine_result'] == 'tesSUCCESS':
      wSend.seq += 1
      failed = False
    else:
      if 'account_sequence_next' in response.result:
        wSend.seq = int(response.result['account_sequence_next'])
      else:
        wSend.seq = await get_next_valid_seq_number(wSend.address, client)
  wSend.busy = False
  if failed:
    print(f"wallet:{wSend.id} FAILED TX {tx.get_hash()} {response}")    

async def main():

make some wallets

print(f"Generating wallets...") master = wallet_from_seed(b"masterpassphrase") wallets = [wallet_from_seed(b"test-%d" % i) for i in range(nWallets)] for i in range(len(wallets)): wallets[i].id = i wallets[i].busy = False

async with AsyncWebsocketClient(url) as client:
if fundAmount > 0:

fund wallets

  print(f"Funding wallets...")
  masterSeq = await get_next_valid_seq_number(master.address, client)    
  for w in wallets:
    try:
      bal = await get_balance(w.address, client)
    except XRPLRequestFailureException:
      bal = 0
    if (bal < fundAmount):    
      tx = Payment(account=master.address, amount=str(fundAmount - bal), destination=w.address, sequence=masterSeq, fee=str(fee))
      masterSeq += 1
      txSigned = sign(tx, master)
    elif (bal > fundAmount):
      seq = await get_next_valid_seq_number(w.address, client)
      tx = Payment(account=w.address, amount=str(bal - fundAmount), destination=master.address, sequence=seq, fee=str(fee))
      txSigned = sign(tx, w)
    else:
      continue
    response = await submit(txSigned, client)    
    if response.status != ResponseStatus.SUCCESS or response.result['engine_result'] != 'tesSUCCESS':
      print(f"txn failed: {response.result}")
      exit(1)

print(f"Fetching sequence numbers...")
for w in wallets:
  w.seq = await get_next_valid_seq_number(w.address, client)

print(f"Running load test...")    
global nTasks
if nTasks == 0 or nTasks > nWallets:
  nTasks = nWallets
tasks = [generate_load(client, wallets) for _ in range(nTasks)]
global tStart
tStart = time.perf_counter() 
await asyncio.gather(*tasks)

print(f"Result: {nTxns} txns {nTxns / runTime} txns/s")

if name == "main": asyncio.run(main())

Expected Result

All transactions succeed.

Actual Result

Some transactions return tefPAST_SEQ although the txns have never been seen before, e.g.: (ID1 same as above)

wallet:63 FAILED TX ID1 Response(status=<ResponseStatus.SUCCESS: 'success'>, result={'accepted': False, 'account_sequence_available': 10, 'account_sequence_next': 10, 'applied': False, 'broadcast': False, 'engine_result': 'tefPAST_SEQ', 'engine_result_code': -190, 'engine_result_message': 'This sequence number has already passed.', 'kept': False, 'open_ledger_cost': '10', 'queued': False, 'tx_blob': '1200002200000000240000000961400000000000000168400000000000000A732102D6A4BE0770363D5815C5AE550C11A530E9DCEC2AAE62DFCF800583A47C44B6CD7446304402200D467F5443159BE926B3DF24E509A7F3C710BB21AB41498B16F1E36BB8CBB187022064049CCBA5B6C8ABC3057B7749733459ECF0B27C1D50099ED43EF6D9A7B0F5248114BDEDD0D17B55022C7F4F6D94752D889C7CB7929E83141AF85539F475E61DDC7BE0D454C67187FDC42A3B', 'tx_json': {'Account': 'rJKEbyiTLskpBMjHmVGqWCQ8628HK2XUQ3', 'Amount': '1', 'Destination': 'rsTchTjz2PtXyTMvHgz8Gvw65ojuUkmMvB', 'Fee': '10', 'Flags': 0, 'Sequence': 9, 'SigningPubKey': '02D6A4BE0770363D5815C5AE550C11A530E9DCEC2AAE62DFCF800583A47C44B6CD', 'TransactionType': 'Payment', 'TxnSignature': '304402200D467F5443159BE926B3DF24E509A7F3C710BB21AB41498B16F1E36BB8CBB187022064049CCBA5B6C8ABC3057B7749733459ECF0B27C1D50099ED43EF6D9A7B0F524', 'hash': 'ID1'}, 'validated_ledger_index': 13}, id='submit_267', type=<ResponseType.RESPONSE: 'response'>)

Environment

Linux ip-xx-xx-xx-xx 6.1.66-93.164.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jan 2 23:50:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

rippled version 2.1.0, built from branch 'master'

Supporting Files

This is on a testnet with 3 nodes. Contents of rippled.cfg:

[server] port_rpc_admin_local port_peer port_https_rpc_public port_http_rpc_public port_wss_public port_ws_public

[port_rpc_admin_local] port = 5005 ip = 127.0.0.1 admin = 127.0.0.1 protocol = https

[port_peer] port = PEER_PORT ip = 0.0.0.0 protocol = peer

[port_https_rpc_public] port = 60505 ip = 0.0.0.0 protocol = https secure_gateway = 0.0.0.0

[port_http_rpc_public] port = 60515 ip = 0.0.0.0 protocol = http secure_gateway = 0.0.0.0

[port_wss_public] port = 60506 ip = 0.0.0.0 protocol = wss secure_gateway = 0.0.0.0

[port_ws_public] port = 60516 ip = 0.0.0.0 protocol = ws secure_gateway = 0.0.0.0

-------------------------------------------------------------------------------

load testing only

[transaction_queue] minimum_txn_in_ledger = 4294967295 target_txn_in_ledger = 4294967295 normal_consensus_increase_percent = 4294967295

[node_size] small

[ledger_history] 864000

[fetch_depth] full

[node_db] type=NuDB path=SERVER_HOME/db/nudb earliest_seq=3 online_delete=864000

[database_path] SERVER_HOME/db

[debug_logfile] SERVER_HOME/logs/debug.log

[perf] perf_log=SERVER_HOME/logs/perf.log log_interval=10

[sntp_servers] time.windows.com time.apple.com time.nist.gov pool.ntp.org

[peer_private] 1

[validators_file] validators.txt

[voting] reference_fee = 10 account_reserve = 1000 owner_reserve = 10

[ssl_verify] 0

[rpc_startup] {"command" : "log_level", "severity" : "info" }

ximinez commented 8 months ago

Cool edge case.

This is why we emphasize over and over again that the result returned by RPC submit is always preliminary, and could change before the result is final. See also: https://xrpl.org/docs/concepts/transactions/reliable-transaction-submission/

As far as I can tell, this same result would be observed if the transaction came in from the network well before the transaction was submitted locally, and therefore, I don't think the result is incorrect. The user doesn't know what's happening internally, and if they submit a transaction to more than one server, they need to understand that they may get unintuitive results.

I'll leave this ticket open to give other folks a chance to weigh in, but I don't think this is a problem.

ubuddemeier commented 8 months ago

I agree that it isn't a problem from that point of view, albeit surprising to see tefPAST_SEQ for a guaranteed novel transaction. It's messy to modify data while it's being accessed by a different function, though. What else could happen?

On Tue, Mar 19, 2024, 1:30 PM Ed Hennis @.***> wrote:

Cool edge case.

This is why we emphasize over and over again that the result returned by RPC submit is always preliminary, and could change before the result is final. See also: https://xrpl.org/docs/concepts/transactions/reliable-transaction-submission/

As far as I can tell, this same result would be observed if the transaction came in from the network well before the transaction was submitted locally, and therefore, I don't think the result is incorrect. The user doesn't know what's happening internally, and if they submit a transaction to more than one server, they need to understand that they may get unintuitive results.

I'll leave this ticket open to give other folks a chance to weigh in, but I don't think this is a problem.

— Reply to this email directly, view it on GitHub https://github.com/XRPLF/rippled/issues/4957#issuecomment-2008075216, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJL6EAMFFW3D65A4ATCSLMDYZCN7PAVCNFSM6AAAAABE53S5TCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGA3TKMRRGY . You are receiving this because you authored the thread.Message ID: @.***>

ximinez commented 8 months ago

But it isn't novel. It's was submitted to another node before the result was returned on the first node.

What else could happen?

So it does look like Transaction doesn't use any locking, though I may have missed something. Given that, the absolute worst case I can think of is that the result fields returned by submit have a mix of the original results and modified results. That's not great, but not that big a deal - that result is preliminary anyway.

I suppose a (reader/writer) lock could be added to Transaction to ensure it doesn't get modified while the results are being read, but this situation could still very easily happen even with that mechanism in place. All it takes is for the network transaction to acquire the writer lock before doSubmit gets the reader lock. The problem is that locking and unlocking could cause a major performance hit when traffic is high, and you're multiplying that small delay by hundreds or thousands of transactions.

I still don't see this as a significant problem, because as I said above, the preliminary transaction results are not incorrect.