Plutonomicon / cardano-transaction-lib

A Purescript library for building smart contract transactions on Cardano
https://plutonomicon.github.io/cardano-transaction-lib/
MIT License
93 stars 50 forks source link

Fix websocket connection issues #321

Closed adamczykm closed 2 years ago

adamczykm commented 2 years ago

Websocket connection to ogmios and datum cache fail about 50% of the time and there is nothing communicated back to the caller of the CTL functions (i.e., an error logs to the console but frontend is not notified of any failure and things hang).

Existing intel by @nrutledge and t.m.:

t:"websocket issue is on frontend/ctl side, I've started local ws server with websocat and only 1/4 requests are getting there" n:"I've tested a bit on my end by manually creating websocket connections from the browser dev tools and also using Postman. In both cases, when testing, I'm getting identical behavior as the Seabug frontend. In fact, I've tested simultaneously, and if the frontend websocket connection fails, the connection from dev tools or Postman also fails." t:"Ok it is nightmare race condition to debug, I created dummy ogmios (not datum cache) server, and I wasn't received some of the requests because if frontend opens websocket to datum cache before everything breaks due to Unexpected server response: 404 returned by ogmios-datum-cache"

Anyone more involved is welcome to provide a more structured description and hints on the issue.

ngua commented 2 years ago

I think this is a duplicate of #233

adamczykm commented 2 years ago

The other one doesn't mention that there are issues with connections

nrutledge commented 2 years ago

I've removed the message I posted above and put a similar one in #233. That way this ticket is focused on the connection issues which require troubleshooting the axios and datum cache and the other one is focused on the PureScript logic.

adamczykm commented 2 years ago

Copied @klntsky thoughts for a better visibility:

A couple of thoughts related to WebSocket errors.

For each type of query, we should maintain a request queue in form of a mapping from listener ids to actual request payloads (e.g. FinalizedTransaction in case of submitTx). Let's call it a pending requests queue.

The queue would allow replaying the requests in case of a failure. In case ws.onerror triggers, all response listeners should be removed and a new websocket should be created. Because of this logic, the pending request queue should be persisted across websockets and only initialized once at application startup.

Our current implementation is callback-based, i.e. mkOgmiosWebSocket' accepts a function that accepts a record of ListenerSets that are used downstream. So WS-related code should be reorganized to allow passing the requests queue around. We also construct an Aff directly with mkOgmiosWebSocketAff which I think is not suitable for new requirements.

Another problem is that there's no guarantee that a query hasn't been completed if it is in the pending requests queue. So in the case of submitTx, the replay code should include a workaround for the case when we try to submit the tx during replay and it fails because of consumed UTXOs.

nrutledge commented 2 years ago

This package may be helpful for implementing the connection retry logic: https://www.npmjs.com/package/reconnecting-websocket