LaboratoireMecaniqueLille / crappy

Command and Real-time Acquisition Parallelized in Python
https://crappy.readthedocs.io/en/stable/
GNU General Public License v2.0
77 stars 16 forks source link

Retry on connection loss #126

Closed occoder closed 2 weeks ago

occoder commented 2 weeks ago

Hi Thanks for bringing up this brilliant project. I've read through the tutorials. As for the custom object definition, the open method is where the connection to the device is established. My question is how does the Crappy handle the connection/communication loss when the experiment is on going? Is there any retry mechanism or strategy already built-in or any best practice that can be shared? Thanks

WeisLeDocto commented 2 weeks ago

Hi ! I'm a bit confused as whether you're referring to a specific object in Crappy (maybe the custom Block presented in the tutorials ?), or if it's just a general question.

Generally speaking, there is no need for a specific module-wide mechanism to handle connection loss during an experiment in Crappy. That is because either the Block, InOut, Camera, or Actuator object temporarily stops working while waiting for re-connection, or it simply crashes if nothing is implemented for handling connection loss. It all depends on what is implemented at the object's level.

For equipment connected over a wire, it is normally expected that the connection will hold throughout the test. Most if not all objects in Crappy operating over a wired connection will simply crash if the wire is unplugged. Conversely, the ClientServer Block that sends data over a wireless connection can handle temporary connection loss just fine.

And if your question is about the specific example described in the tutorials, it is a good example of a Block that would benefit from a re-connection mechanism, but it is simply not implemented there.

occoder commented 2 weeks ago

Thank you for the timely response. This is a general question and I'd like to know if Crappy already addressed this aspect. This issue is often encountered in a real experiment. The potential causes are

  1. device receive buffer overflowed by flooding requests
  2. weak protocol that is intolerant to bad packets which in turn leads to communication crash
  3. device is busy at processing other tasks appearing unresponsive when being accessed
WeisLeDocto commented 2 weeks ago

Can you indicate what kind of device you're interfacing with ? Over which protocol ?

We've had similar issues when trying to perform high-speed I2C communication, our only options were to implement a timeout + retries scheme in the InOut objects we wrote, or to lower the data rate. But again, this is specific to the devices we used, and was only implemented at the InOut level.

occoder commented 2 weeks ago

Can you indicate what kind of device you're interfacing with ? Over which protocol ?

It's a general concern. Most of our experiments were carried out without man attended during the night time. Because the experiments usually are time consuming that we hope to take advantage of the night time when everybody is off work. So the next day your finding the communication is lost last night wiil be very disapointing and annoying. Hope Crappy could implement some fault-tolerant design in a more generic manner.

WeisLeDocto commented 2 weeks ago

Hope Crappy could implement some fault-tolerant design in a more generic manner.

The difficulty here is that Crappy is merely a conductor synchronizing and leading Python processes. Some pieces of hardware are included in the codebase for our own convenience, but the core is really just a complex process-management algorithm + some helper Blocks. And as such, Crappy has no way to know what Blocks, InOuts and other objects are doing, neither how they're doing it.

So, if an InOut fails, is that critical ? Should the experiment be stopped, or can it continue ? Can this InOut be re-started ? How many times should it be re-started if it keeps failing ? The answers to these questions are highly dependent on the context of the experiment. Therefore, I think it makes the most sense to leave these design choices up to developers writing their custom objects.

The only generic-enough feature I can think of would be some flag indicating at the Block level whether a crash during the experiment should stop Crappy, or if it can be safely ignored.

WeisLeDocto commented 2 weeks ago

Closing this issue now as the original question was answered and there is no clear way to implement the requested feature.