Open DominicOram opened 9 months ago
We should, at the start of an experiment:
This should be able to be handled fairly easily in __main__.py
using some derivative of the monitoring code at https://github.com/DiamondLightSource/hyperion/blob/947_run_callbacks_in_separate_process/tests/system_tests/external_interaction/callbacks/test_external_callbacks.py
Is the start enough? If we do it just at the start it might not be obvious that it's the data in the last run that is potentially corrupted
ah yeah, that's not enough. I think the simplest way to get the info back to Hyperion is that the data service should remember if it failed something, and refuse to start if the last run went wrong. Otherwise we need some kind of DataserviceLivenessDevice
and I don't like where that would be going...
With moving interacting with external services into a different process we need to make sure we still correctly handle failure modes. The ideal way of handling this is to pile up the non-urgent jobs we need to do and do them when services come back online. However, this is complicated. For the first instance we should stop Hyperion if any of the following fail:
Acceptance Criteria