LinuxForHealth / connect

LinuxForHealth Data Flows
Apache License 2.0
22 stars 5 forks source link

Debug sync start up #105

Closed ccorley closed 3 years ago

ccorley commented 3 years ago

With the latest changes for sync deployed across LFH instances, it's clear that there is an intermittent start up issue where we don't always receive NATS messages from the remote LFH instance. The symptom is that if we get NATS messages, we always get NATS messages, and if we don't get NATS messages, we never get NATS messages. Even across multiple clean restarts of a local LFH instance (docker container restart followed by pyconnect restart) there may be repeated restarts where we don't get sync messages at all, followed by a clean restart where we do get NATS sync messages.

Things to check:

ccorley commented 3 years ago

The problem seems to be that the Python NATS client treats the servers in the connection string as a server pool, as if they are a NATS cluster, to provide message delivery reliability: https://github.com/nats-io/nats.py/blob/50034c9a13c7ffe85cdf36be97ccf858559144b5/nats/aio/client.py#L269 What it doesn't seem to do is deliver a message to each of the servers in the connection string.

So what we need to do is create a client instance per NATS server we want to connect to. The nats client is already async, so we just need to instantiate multiple clients.
cc: @dixonwhitmire