Closed jasagredo closed 4 weeks ago
Analyzing the given logs we found an error
call being thrown:
2407853-{"at":"2024-07-23T04:39:52.549336219Z","ns":"TxSubmission.TxInbound.Collected","data":{"count":1,"kind":"TraceTxSubmissionCollected","peer":{"connectionId":"172.31.82.173:30004 52.59.156.159:59440"}},"sev":"Debug","thread":"15279","host":"client-us-02"}
2407854:{"at":"2024-07-23T04:39:52.549589434Z","ns":"Net.InboundGovernor.Remote.ResponderErrored","data":{"connectionId":{"localAddress":{"address":"172.31.82.173","port":"30004"},"remoteAddress":{"address":"52.202.221.142","port":"30007"}},"kind":"ResponderErrored","miniProtocolNum":{"kind":"MiniProtocolNum","num":4},"reason":"InMemoryClosedExn\nCallStack (from HasCallStack):\n error, called at src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs:75:45 in ouroboros-consensus-0.20.0.0-1Z2ZJe3BSfK6gl0UefXzYq:Ouroboros.Consensus.Storage.LedgerDB.V2.InMemory"},"sev":"Info","thread":"119","host":"client-us-02"}
The miniprotocol in question is TxSumbission, so it makes sense that if it errored then the TMVar was not put back.
Some consensus code is using a LedgerTablesHandle that has been closed. Question is now, where is this handle, who is using it after close.
Closed as complete via #1208
Analyzing the logs from the two nodes that stoped forging, it seems clear that they both stopped in the following chunk of code:
The Mempool uses a TMVar to be able to run actions in IO to access the tables, however if an exception was raised at some point the TMVar might not have been properly put back. It seems unclear what exception could be thrown there, but the result seems pretty clear: the TMVar is not put back and the node cannot continue.