DiamondLightSource / hyperion

Unattended Data Collection using BlueSky / Ophyd
BSD 3-Clause "New" or "Revised" License
6 stars 5 forks source link

Stop Hyperion if external interactions fail #1064

Open DominicOram opened 6 months ago

DominicOram commented 6 months ago

With moving interacting with external services into a different process we need to make sure we still correctly handle failure modes. The ideal way of handling this is to pile up the non-urgent jobs we need to do and do them when services come back online. However, this is complicated. For the first instance we should stop Hyperion if any of the following fail:

Acceptance Criteria

dperl-dls commented 6 months ago

We should, at the start of an experiment:

This should be able to be handled fairly easily in __main__.py using some derivative of the monitoring code at https://github.com/DiamondLightSource/hyperion/blob/947_run_callbacks_in_separate_process/tests/system_tests/external_interaction/callbacks/test_external_callbacks.py

DominicOram commented 6 months ago

Is the start enough? If we do it just at the start it might not be obvious that it's the data in the last run that is potentially corrupted

dperl-dls commented 6 months ago

ah yeah, that's not enough. I think the simplest way to get the info back to Hyperion is that the data service should remember if it failed something, and refuse to start if the last run went wrong. Otherwise we need some kind of DataserviceLivenessDevice and I don't like where that would be going...