Open rkm opened 3 years ago
Wouldn't it be easier to make the downstream services deal with duplication correctly? e.g. by overwritting a file in the output directory? There are probably quite a few situations where we might want to re-run an extraction or restart from the beginning.
Theres also some use cases where you would want Ctrl+C just to exit asap like if it is blocking querying a table that is taking a long time to respond to each query (e.g. if it is big and missing an index).
Yes, idempotence makes this a non-issue; a "clean" shutdown is usually neater when possible, so having the ctrl-C/SIGTERM handler set a flag and quit abruptly on second invocation seems best for that? Retrying some messages could result in a lot of extra work (like redoing a whole 1TB extraction that had been 99% complete): we need to make sure doing that doesn't actually break anything, but should try to minimise how often it happens too.
When CohortExtractor is shut down either by a signal (Ctrl+C) or through the control message queue, it's extremely likely that it will be midway through processing a single input message and will have sent on some amount of responses downstream. Since it exits immediately, the input message will be re-queued and the next service will end up sending duplicate messages.
While we can't be completely robust against message duplication (e.g. in the case of a complete crash), we can at least fix the service so that it finishes sending all the messages for a given input message before exiting. The machinery for this should be similar to other services e.g. DicomTagReader.