coingaming / lnd-client

Lightning Network Daemon (LND) client library for Haskell
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

CloseChannel - thread blocked indefinitely in an MVar operation #98

Open tim2CF opened 3 years ago

tim2CF commented 3 years ago

Logs

[2021-08-11 12:45:14][CoinsAgent][Debug][5664004dd0fc][PID 16283][ThreadId 5692][RpcRequest:CloseChannelRequest {channelPoint = ChannelPoint {fundingTxId = TxId "\141=\217\165}F\DC3\208$\SOH\205\196\173\STX\163\248\&6\228\NUL \225\183\205-\227\&3\183#l\185\199y", outputIndex = 1}, force = False, targetConf = Nothing, satPerByte = Nothing, deliveryAddress = Nothing}][RpcName:closeChannel][LndHost::localhost][LndPort:10009][lnd-client-0.1.0.0-CEtWjRjF6USOPQj4AHX1Q:LndClient.RPC.Generic src/LndClient/RPC/Generic.hs:156:9] RPC is running

And then

Failures:

  test/TestWithMerchantPartner.hs:16:3: 
  1) CoinsAgent.Reaction.EscrowExternalPayment EscrowExternalPaymentReaction coins -> payments agent
       uncaught exception: SomeAsyncException
       ExceptionInLinkedThread ThreadId 5692 thread blocked indefinitely in an MVar operation

  To rerun use: --match "/CoinsAgent.Reaction.EscrowExternalPayment/EscrowExternalPaymentReaction coins -> payments agent/"

Randomized with seed 475671990

Finished in 114.0409 seconds
21 examples, 1 failure

Bug description

16:53 < timCF> Hello guys! I have a problem with `thread blocked indefinitely in an MVar operation` error. In my case at some point there is possibility that all threads which can write to 
               the MVar are dead, and in this case Haskell runtime raises this async exception. To protect my program from this, I'm using `tryTakeMVar :: MVar a -> IO (Maybe a)` function 
               instead of `takeMVar` to handle deadlock situations. But async 
16:53 < timCF> exception still happens from time to time. What I'm doing wrong?
16:55 -!- slowButPresent [~slowButPr@user/slowbutpresent] has joined #haskell
16:56 -!- vysn [~vysn@user/vysn] has quit [Ping timeout: 258 seconds]
16:56 -!- silasfox [~silasfox@62.159.27.1] has quit [Quit: Connection closed]
16:56 < int-e> timCF: Hard to say, but note that the exception can arise from putMVar as well
16:57 < timCF> int-e: I'm using `tryPutMVar` to avoid this error case.
16:57 < int-e> "Hard to say"--we'll need more details to narrow things down.
16:59 < merijn> timCF: If at some point all threads who can write are dead, then how would you ever recover?
17:00 < merijn> timCF: What are you using the MVar for?
17:00 < timCF> int-e: I can send a link to source code, maybe code explains itself better than me 
17:00 < timCF> https://github.com/coingaming/lnd-client/blob/1a45d5fa731f39f73243a8a624d583a5858729b3/src/LndClient/RPC/Silent.hs#L131-L143
17:01 < merijn> timCF: Why not use an actual channel?
17:01 < timCF> merijn: Just to detect that something happened in spawned thread. In case where all processes are dead, tryTakeMVar will be always nothing and retry counter will run into zero
17:02 < timCF> merijn: you mean TChan instead of MVar?
17:03 < timCF> Is there any advantage in TChan vs MVar for case where it's needed just to detect once that something happened?
17:03 < merijn> That means wrapping with extra STM to check if it's closed, though
17:03 < merijn> timCF: hm
17:03 < merijn> timCF: You just have N workers and wanna know if they're all dead, yeah?
17:03 < merijn> timCF: Don't you just want QSemN?
17:06 < timCF> merijn: No, I have an gRPC subscription procedure which should return something from a server stream in case of success. So I'm providing to stream handler link to MVar to 
               fill in. And then spawning subscription processes, trying to read this MVar in main process and hope for the best
17:07 < timCF> In case this "stream callback MVar" was not filled in 10 attempts - all procedure fails.
17:08 -!- Sgeo [~Sgeo@user/sgeo] has joined #haskell
17:09 < kuribas> Is there a validation transformer?
17:09 -!- Vajb [~Vajb@hag-jnsbng11-58c3ab-85.dhcp.inet.fi] has quit [Read error: Connection reset by peer]
17:10 < timCF> Spawned gRPC procedure might die before calling stream callback, which results in no active MVar references, but it should be fine because I'm using `tryPutMVar` and 
               `tryTakeMVar` to avoid deadlock detection of Haskell runtime.
tim2CF commented 3 years ago
17:32 < int-e> Then again I'm not sure I can help anyway. AFAIUI, try{Put,Take}MVar *can't* cause this exceptions because they can never block at all.
17:33 < int-e> Which strongly suggests that the actual MVar that's causing this is somewhere else, maybe in a dependency.
17:33 -!- ec [~ec@gateway/tor-sasl/ec] has joined #haskell
17:35 -!- lortabac [~lortabac@2a01:e0a:541:b8f0:3164:2b89:620b:cb12] has joined #haskell
17:35 < timCF> int-e: Hmm. Thanks for reply anyway! Maybe implementation of these functions was different before. I'm using not the latest version of Base, from lts-14.27
17:36 < int-e> these have been dedicated primops for a very long time