CardanoSolutions / ogmios

❇️ A WebSocket JSON/RPC bridge for Cardano
https://ogmios.dev
Mozilla Public License 2.0
304 stars 90 forks source link

Error During Connection Closure in Ogmios v6.4.0 After Transaction Evaluations Query #403

Closed bhatt-deep closed 3 months ago

bhatt-deep commented 3 months ago

What Git revision / release tag are you using?

Ogmios v6.4.0

Do you use any client SDK? If yes, which one?

python-ogmios

Describe what the problem is?

The issue with Ogmios v6.4.0 revolves around an error encountered during the connection closure phase after transaction evaluations query. Although the transaction evaluations themselves complete successfully and return the expected results, an AsyncCancelled error occurs when the system attempts to close these WebSocket connections. I have tested this issue using the Python SDK and through direct calls from Postman with transactions from the Eternal wallet. This problem impacts active connections, leading to performance issues.

What should be the expected behavior?

The server should handle connection closures smoothly and efficiently after completing transaction evaluations, without errors.

If applicable, what are the logs from the server around the occurence of the problem?

{"severity":"Info","timestamp":"2024-07-11T00:22:43.223835153Z","thread":"18","message":{"Health":{"tag":"HealthTick","status":{"startTime":"2024-07-10T23:07:00.538807756Z","lastKnownTip":{"slot":64974163,"id":"5c80be818a9a51b88e6b4f1ab0b79843786882d5915b9d38d50133ef4d82f2af","height":2457031},"lastTipUpdate":"2024-07-11T00:22:43.223271626Z","networkSynchronization":1.00000,"currentEra":"babbage","metrics":{"activeConnections":2,"runtimeStats":{"cpuTime":19088945452,"currentHeapSize":1265,"gcCpuTime":15071009211,"maxHeapSize":1483},"sessionDurations":{"max":0,"mean":0,"min":0},"totalConnections":2,"totalMessages":4,"totalUnrouted":0},"connectionStatus":"connected","currentEpoch":154,"slotInEpoch":87763,"version":"v6.4.0 (4dbf1f34)","network":"preprod"}}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Info","timestamp":"2024-07-11T00:22:47.128821401Z","thread":"1011","message":{"WebSocket":{"tag":"WebSocketConnectionAccepted","userAgent":"User-Agent unknown"}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Info","timestamp":"2024-07-11T00:22:47.129345971Z","thread":"1015","message":{"WebSocket":{"contents":{"agency":"ClientAgency TokPropose","event":"send","tag":"ProposeVersions","versions":["NodeToClientV_15","NodeToClientV_16"]},"tag":"WebSocketClient"}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Info","timestamp":"2024-07-11T00:22:47.129455094Z","thread":"1016","message":{"WebSocket":{"contents":{"agency":"ClientAgency TokPropose","event":"send","tag":"ProposeVersions","versions":["NodeToClientV_15","NodeToClientV_16"]},"tag":"WebSocketClient"}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Info","timestamp":"2024-07-11T00:22:47.13007351Z","thread":"1016","message":{"WebSocket":{"contents":{"agency":"ServerAgency TokConfirm","event":"receive","tag":"AcceptVersion","version":"\"NodeToClientV_15\""},"tag":"WebSocketClient"}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Info","timestamp":"2024-07-11T00:22:47.13034125Z","thread":"1015","message":{"WebSocket":{"contents":{"agency":"ServerAgency TokConfirm","event":"receive","tag":"AcceptVersion","version":"\"NodeToClientV_15\""},"tag":"WebSocketClient"}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Info","timestamp":"2024-07-11T00:22:47.131918098Z","thread":"1028","message":{"WebSocket":{"contents":{"atSlot":64974164,"ns":6565465825373618,"tag":"TxSubmissionLocalMempoolNewSnapshot"},"tag":"WebSocketTxSubmission"}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Info","timestamp":"2024-07-11T00:22:47.132332293Z","thread":"1028","message":{"WebSocket":{"contents":{"tag":"TxSubmissionLocalMempoolSynchronized","took":"413µs"},"tag":"WebSocketTxSubmission"}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Error","timestamp":"2024-07-11T00:22:51.405651118Z","thread":"1032","message":{"WebSocket":{"exception":"AsyncCancelled","tag":"WebSocketUnknownException"}},"version":"v6.4.0 (4dbf1f34)"}
{"severity":"Info","timestamp":"2024-07-11T00:23:15.156028423Z","thread":"18","message":{"Health":{"tag":"HealthTick","status":{"startTime":"2024-07-10T23:07:00.538807756Z","lastKnownTip":{"slot":64974195,"id":"3d1cc6914adcfcdc1e5259a15831668861ec8791556afe2be434c67584f7c23b","height":2457032},"lastTipUpdate":"2024-07-11T00:23:15.155364787Z","networkSynchronization":1.00000,"currentEra":"babbage","metrics":{"activeConnections":3,"runtimeStats":{"cpuTime":19324199683,"currentHeapSize":1421,"gcCpuTime":15239590685,"maxHeapSize":1641},"sessionDurations":{"max":0,"mean":0,"min":0},"totalConnections":3,"totalMessages":6,"totalUnrouted":0},"connectionStatus":"connected","currentEpoch":154,"slotInEpoch":87795,"version":"v6.4.0 (4dbf1f34)","network":"preprod"}}},"version":"v6.4.0 (4dbf1f34)"}
KtorZ commented 3 months ago
  1. Thanks for reporting, but this is quite curious. I can indeed reproduce, and only for transaction evaluation. I'll dig into this.

  2. Also, I am trying to understand how "This problem impacts active connections, leading to performance issues."? In what form are active connections impacted exactly?

bhatt-deep commented 3 months ago

Thank you for looking into this so quickly.

The main issue for us is the growing number of Ogmios active connections over time, which we first noticed due to an unusually high number of connections for our setup. Upon investigating, we linked this increase directly to the AsyncCancelled error that occurs during the disconnection phase of transaction evaluation queries. Each time a transaction evaluation is made and subsequently attempted to disconnect, the error prevents these connections from closing properly, causing an increase in active connections.

This buildup of unclosed connections consumes significant server resources, leading to performance degradation. I'm also sharing the health stats where we are noticing this issue.

{
  "startTime": "2024-07-08T16:58:27.717868478Z",
  "lastKnownTip": {
    "slot": 65031603,
    "id": "c621cd16a133ebea63adb6958f7b54d7ab3d2c2475bf2f1c8f7f7b2e12574da2",
    "height": 2459630
  },
  "lastTipUpdate": "2024-07-11T16:20:03.302878753Z",
  "networkSynchronization": 1,
  "currentEra": "babbage",
  "metrics": {
    "activeConnections": 1406,
    "runtimeStats": {
      "cpuTime": 9228600571985,
      "currentHeapSize": 239145,
      "gcCpuTime": 8828108601104,
      "maxHeapSize": 239451
    },
    "sessionDurations": {
      "max": 0,
      "mean": 3019.1511481598,
      "min": 0
    },
    "totalConnections": 7764,
    "totalMessages": 15528,
    "totalUnrouted": 2
  },
  "connectionStatus": "connected",
  "currentEpoch": 154,
  "slotInEpoch": 145203,
  "version": "v6.4.0 (4dbf1f34)",
  "network": "preprod"
}