SAFEHR-data / emap

Near real-time clinical database for research and innovation
2 stars 1 forks source link

Deal with problem of multiple emap instances needing access to the same waveform data #48

Open jeremyestein opened 2 months ago

jeremyestein commented 2 months ago

Epic: #29

Current design does not account for the need to have (say) a live and a dev Emap instance up at the same time.

Each Emap instance listens out for its own HL7 stream, but unless we ask for the Smartlinx data to be supplied to multiple ports, this data will need to be distributed to multiple instances somehow.

Since the listening software may itself need updating as part of a dev cycle, it seems to me better that as little of the pipeline as possible is implemented before the multicasting happens, to reduce needs for updates in a part of the system that will serve multiple instances of Emap (that may be running different versions).

You could implement this purely at the TCP level - a single program listens for data and then passes it on to the other instances (it would have to know where they are, get through some docker networking stuff, handle failures in some way, and so on). This seems like something where an off the shelf solution should exist? nginx?

A better method might be to take the IDS approach. The single listener writes the relatively unprocessed data to a DB or rabbitmq queue, and multiple readers are free to read in their own time. That way, the listener doesn't have to know who the subscribers are, although to avoid storage becoming huge, it would need to periodically delete old data. Pure HL7 might be way too large here, given its extremely poor information density. (Possible compromise: only try to buffer a very small amount of data, eg. 1 hour - still enough to be useful but won't exceed capacity - might be a lot of I/O though)

A decision on this will likely have to wait until I know more about how Smartlinx works.