batuhan / og-saltana

Abandon iteration for Minipage
https://www.minipagehq.com
1 stars 0 forks source link

critical bug - redis connection in broken state #40

Open batuhan opened 2 years ago

batuhan commented 2 years ago

We get the following error, both on local and production when using the production redis installation. We might be hitting the connection limit on the cloud (digitalocean for now but aws has similar restrictions).

Code at ./services/core/src/redis.js. It does use a singleton.

This is critical since the redis connection is essential for workers, signal & multi tenantness of our platform thus a failed redis connection should (and currently does) result in an exited process

We might think about stripping in favour of something like SQS for the workers, Firebase etc. for Signal and env. variables for the multi tenantness but this would add time to our time to deploy.

{"context":{"application":"saltana-core","instanceId":"default","logLevel":50,"err":{"code":"CONNECTION_BROKEN","name":"Error","message":"Redis connection in broken state: retry aborted.","stack":"Error: Redis connection in broken state: retry aborted.\n    at RedisClient.connection_gone (/root/saltana/node_modules/redis/index.js:569:30)\n    at TLSSocket.<anonymous> (/root/saltana/node_modules/redis/index.js:231:14)\n    at Object.onceWrapper (node:events:509:28)\n    at TLSSocket.emit (node:events:402:35)\n    at TLSSocket.emit (node:domain:475:12)\n    at endReadableNT (node:internal/streams/readable:1343:12)\n    at processTicksAndRejections (node:internal/process/task_queues:83:21)"}},"message":"Uncaught exception","sequence":"0","time":1640699305053,"version":"2.0.0"}
{"log.level":"info","@timestamp":"2021-12-28T13:48:25.063Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"Sending error to Elastic APM: {\"id\":\"9d90690552ae800af31ccebd194527df\"}"}
{"log.level":"info","@timestamp":"2021-12-28T13:48:25.064Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"Sending error to Elastic APM: {\"id\":\"f3f8db398ed137c65a4b2b187bc03fb6\"}"}
batuhan commented 2 years ago
Screen Shot 2021-12-28 at 5 09 02 PM

APM works, so thats good.

batuhan commented 2 years ago

disabling signal/websockets feature helps but not does solve this issue completely since we are not solving the problem but delaying it. getPlatformData stuff should just return values from the env and we shouldn't depend on redis at all.

this might have implications for the tests we are running but that's also a design flaw.