dimxy / komodo

Komodo
https://komodoplatform.com/
Other
7 stars 4 forks source link

Potential Deadlock detected with cs_vSend cs_vRecvMsg #71

Closed dimxy closed 2 years ago

dimxy commented 3 years ago

Received an abortion of komodod with DEBUG_LOCKORDER on, with a print on console:

POTENTIAL DEADLOCK DETECTED
Previous lock order was:
 (1) pnode->cs_vSend  net.cpp:1629 (TRY)
 cs_main  main.cpp:8649 (TRY)
 cs_wallet  wallet/wallet.cpp:3222
 (2) cs_mapRelay  net.cpp:1924
Current lock order is:
 pnode->cs_vRecvMsg  net.cpp:1610 (TRY)
 cs_main  main.cpp:7355
 (2) cs_mapRelay  main.cpp:7449
 (1) cs_vSend  net.cpp:2218
Assertion failed: (onlyMaybeDeadlock), function potential_deadlock_detected, file sync.cpp, line 127.

Env: two node chain with tokel default params (adaptivepow etc), started as tokeld -ac_name=TKL02, only one node was mining. the error occurred on the non mining node.

jmjatlanta commented 3 years ago

In which branch did this take place?

dimxy commented 3 years ago

In which branch did this take place?

it was found originally in this branch: https://github.com/dimxy/komodo/blob/691a18b98a46d47a8bdb33177960fb95b2b9e860/src/main.cpp#L7454 (this branch is for tokel project, generates tokeld/tokel-cli executables with the tokel burned params as defaults, instead of komodod/komodo-cli)

jmjatlanta commented 3 years ago

Yes, this certainly has the potential for a deadlock. IMO the easy fix is to gather the data from mapRelay, unlock, then call pfrom->PushMessage at https://github.com/dimxy/komodo/blob/746c982bade26193c21e25ec4384975699bf831d/src/main.cpp#L7449:L7454

A possible strategy to avoid this in the future (architecture change): When it comes to sending things over the wire, figure out a way to unlock almost everything and then send. What I've seen elsewhere is all network messages derive from the same class, and anything to be sent is placed in a queue. Once processing is complete, and methods "bubble back up the stack" (and thereby release their mutexes), the near last step is to lock that queue and send everything in it.

I haven't examined the comms layer to see if that is a viable strategy. I'm just throwing it out there in case it gets your creative juices flowing.

jmjatlanta commented 3 years ago

In taking a second look, my suggested fix may not work if the CDataStream is expected to be protected by cs_mapRelay.

If cs_mapRelay only protects the collection, and copies of CDataStream are expected to handle errors, we're good. Otherwise, releasing the lock could allow CDataStream to become invalid before the PushMessage is complete.

Note that CDataStream is a copy, so we don't have to worry about lifetime issues. We only need to worry if having cs_mapRelay locked means something to CDataStream.

dimxy commented 3 years ago

As looked into calls stack this occurs when: On receiving a "getdata" message for a tx in memory, relaying of this tx is triggered (so cs_mapRelay is locked and cs_vSend is locked in PushMessage) another thread, SendMessages (which already has cs_vSend locked) calls Broadcast() and it calls RelayWalletTransactions() and eventually RelayTransaction() which by the way tries to remove expired txns from mapRelay (locking cs_mapRelay) , So we could have a deadlock here.

Seems only copying CDataStream under cs_mapRelay lock and doing PushMessage after the lock released, not locking cs_mayRelay and cs_vSend at the same time, is the correct solution. Is it something like that: https://github.com/dimxy/komodo/commit/c6e017e59d6bb393b1b2f9eada2daede362208ae ? And I dont think we need to protect CDataStream, it looks it is just a serialised tx in a dedicated variable

jmjatlanta commented 3 years ago

Yes, your solution gives the desired results. The call to PushMessage will lock cs_vSend, but mapRelay has already been read and the cs_mapRelay released. Deadlock averted.

dimxy commented 2 years ago

https://github.com/dimxy/komodo/commit/91dfbd69fedcc7123913ccff0ec83d91c258db0f fixed in the tokel branch