czbiohub-sf / shrimPy

shrimPy: Smart High-throughput Robust Imaging & Measurement in Python
BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Acquisition finishes but fails to cleanup #41

Closed edyoshikun closed 1 year ago

edyoshikun commented 1 year ago

Sometimes the acquisition finishes, but the flag and code for cleanup do not run so the acq_engine.acquire stalls waiting to finish. Logs: 2023-05-16 18:12:17,443 - DEBUG - acq_engine.acquire - Waiting for acquisition to finish

Exiting the PowerShell or Ctrl+C to kill the process results in errors on the Light Sheet arm hardware. This results in many devices being left with open ports and the micromanager cannot connect to them leading to connection COM errors.

I usually restart the computer to fix the issue.

mattersoflight commented 1 year ago

@edyoshikun , @ieivanov the resource management can be handled cleanly by writing acquisition manager classes as context managers . I think that [acquisition.acq_engine.BaseChannelSliceAcquisition.setup](https://github.com/czbiohub/mantis/blob/6085b0420b52ad147e4a12831488276846b07573/mantis/acquisition/acq_engine.py#L157) should map to __enter__ dunder and acquisition.acq_engine.BaseChannelSliceAcquisition.reset should map to __exit__ dunder.

This StackOverflow post suggests that __exit__ method is also a good place to retry cleaning up of the resources.

ieivanov commented 1 year ago

The problem here is not so much with our python code, but rather that the headless MM java process which PM starts is not shut down properly if the mantis acquisition crashes. My workflow is to shut it down manually by ending the "Java something something SE" process in the Task Manager - you'll recognize it by the SE in its name. Then you can launch MM again without having to restart the computer. The shutdown of this process is handled by PM using atexit but it seems that doesn't execute when the python program crashes

mattersoflight commented 1 year ago

@ieivanov

The problem here is not so much with our python code, but rather that the headless MM java process which PM starts is not shut down properly if the mantis acquisition crashes.

I certainly see that the source of the problem is not our Python code, but resource management problems that occur when MM or PM crash. I am thinking that the context manager pattern allows us to make acquisitions more robust when MM or PM dies.

ieivanov commented 1 year ago

Fixed by #81