ZhuangLab / storm-control

Microscope control software
Other
65 stars 67 forks source link

Bug in updated Marzhauser Module Class #106

Open jeffmoffitt opened 4 years ago

jeffmoffitt commented 4 years ago

We have found that the on our microscopes, which use Marzhauser stages, the 'Current Position' values in the stage display no longer update periodically. They update if we command a move with Dave or with the 'Move To' controls in the stage display.

We have traced the problem to the last changes to the marzhauserModule.py (commit ec3d37ab). If we checkout of the version of this file from the parent commit (85bffb4), the stage position updates properly.

Perhaps in handling the garbled messages from the stage position requests, periodic polling of the stage position was broken? Happy to help address and fix the issue, but we also don't want to accidentally break the handling of these garbled messages.

HazenBabcock commented 4 years ago

Do you mean that the problem is that if you move the stage with the stage controller joystick the position isn't getting updated?

Anyway, yes, it appears that periodic polling was broken or removed. The updateTimer is now configured to be single shot. Maybe the idea was restart the timer in handleUpdateTimer? However it's been a while and I'm also nervous about changing this again. I believe that the underlying problem is that during automated runs there was a probability that these stages would just stop responding to position update requests even though they would continue to work properly. If we kept polling then the acquisition computer would run out of thread resources and the experiment would lock. This was considered a worse outcome than not getting position feedback from the stage so periodic polling was removed. Not sure if just one extra hung thread was enough to bring down the acquisition computer, but if it was that might be why the timer is not restarted in handleUpdateTimer and why we are also just assuming that the stage is where we told it to be (see goAbsolute). A possible work around is to change this stage to watch for TCP connection messages, and when there was TCP connection it would turn off periodic polling. Presumably when HAL is in manual mode and the stage locks this will be less annoying than if it happens in the middle of a 24 hour experiment.

@seichhorn could comment?

jeffmoffitt commented 4 years ago

Yes, if we move the stage with a joystick, the position does not update. Similarly if we you any of the move arrows, then we observed the same problem. I had the same sense that there was a polling problem, but given that we do not see a message problem on our scope, I did not want to suggest fixes that would undo the fixes for this problem.

Happy to discuss alternative fixes and help if we can.

HazenBabcock commented 4 years ago

Could you try commenting out setting self.updateTimer.setSingleShot(True)?

https://github.com/ZhuangLab/storm-control/blob/768d1cc64591793b2d6e6f2a18c55f745de08217/storm_control/sc_hardware/marzhauser/marzhauserModule.py#L31-L33

And let me know if that works. If it does then I think the easiest thing is to make continuous polling an optional parameter, default to True and turn it off on setups where this causes problems.

seichhorn commented 4 years ago

Hazen's summary is accurate. We've had the issue on multiple microscopes with marzhauser stages that if we allow periodic polling of positions we frequently encounter partial responses (i.e. if it was to return an X and Y coordinate it would return just an X, or an X and part of a Y, or just part of an X). The thread receiving the bad message never resolves and after we pile up several frozen threads the computer runs out of threads and the run freezes.

We tried several things to solve the issue while maintaining periodic polling, but none resolved the issue completely. One of the last things we tried was using a regex to process only correct messages and pass those that are incorrect did not fully resolve the issue (I cannot remember what the ongoing issue was at this point).

@jeffmoffitt Do your systems not have issues with freezing if you have periodic polling enabled?

HazenBabcock commented 4 years ago

My memory is a bit hazy on this, but I believe that the underlying issue, at least on that setup, is that every once in a while self.stage.readline() would hang forever even though it has a timeout.

https://github.com/ZhuangLab/storm-control/blob/768d1cc64591793b2d6e6f2a18c55f745de08217/storm_control/sc_hardware/marzhauser/marzhauserModule.py#L107-L110

If we only polled for a position when we got a response then polling would break the first time we failed to parse a stage response, nothing would get returned to trigger another position request. If we polled every X seconds then eventually self.stage.readline() would hang, threads would stack up and the run would die. This might have been solvable by returning something even for unparseable strage responses, but then this might not work because the unparseable response could have come from a stage move instead of a position request. Not an easy problem to solve.

jeffmoffitt commented 4 years ago

@seichhorn @HazenBabcock

Just wanted to revive this thread and this outstanding issue.

To your question, I have not encountered this problem to the best of my knowledge either with any of our scopes (which have not had any freezing issues) or with the scopes I was using in the Zhuang lab.

My guess is that this is a hardware issue that has arisen with time in your systems, and I would suggest a hardware fix. For example, I did have to replace serial communication modules on some systems in the Zhuang lab--lasers not stages. It is worth contacting Marzhauser with this issue.

Personally, I do not prefer a solution in which we disable a useful component of the software interface to this stage because we have a few faulty stage control systems.

Nonetheless, I recognize that this suggestion is not the most expedient or useful. So how about a compromise? Can you add a parameter to the stage component of the config.xml that enables or disables periodic polling for these stages? Then you can turn this behavior off for systems that can't communication robustly with the stage.

aaronhalpern commented 3 years ago

@HazenBabcock @jeffmoffitt

I too want to re-revive this issue. A number of new MERFISH setups in the Zhuang lab have run into the setSingleShot(True) issue, that has added confusion to making mosaics in Steve without the stage position update. I thought @jeffmoffitt suggestions was good, to enable polling through the config.xml. However, would it would be more convenient if we could toggle polling using the HAL settings files? Then you could setup your experiment with stage polling enabled to do the mosaic etc, then Dave could take over using a settings file with polling disabled. I have attempted to implement this here and it seems to be working:

https://github.com/aaronhalpern/storm-control/blob/marztest/storm_control/sc_hardware/marzhauser/marzhauserModule.py

One thing I want to confirm is whether this is indeed the correct way to toggle the polling (line 246):

elif (pname == "polling"):

print('set the polling to ' + str(p.get("polling")))

            if p.get("polling"):
                self.stage_functionality.polling_thread.startPolling()
            else:
                self.stage_functionality.polling_thread.stopPolling()

In addition, I have also added the joystick enable/disable parameter in the settings. One of the Marzhauser joysticks drifts even when it not being touched, so disabling it when it is misbehaving would be helpful. I realize these changes do not get at the root cause of failed readline commands as @HazenBabcock mentioned above, but hopefully the decreased polling might alleviate it somewhat… Anyways your comments would be greatly appreciated if the above sounds reasonable.

jeffmoffitt commented 3 years ago

Hi Aaron,

Just a quick question on this point: how many setups are having this polling issue? I am in favor of a combined solution, involving the ability to turn off this functionality for a given setup (via the config.xml file) and having the hardware repaired that is giving this issue.

Happy to follow up outside of github comments on this point if helpful.

Best Jeff

On Mon, Mar 22, 2021 at 11:25 AM aaronhalpern @.***> wrote:

@HazenBabcock https://github.com/HazenBabcock @jeffmoffitt https://github.com/jeffmoffitt

I too want to re-revive this issue. A number of new MERFISH setups in the Zhuang lab have run into the setSingleShot(True) issue, that has added confusion to making mosaics in Steve without the stage position update. I thought @jeffmoffitt https://github.com/jeffmoffitt suggestions was good, to enable polling through the config.xml. However, would it would be more convenient if we could toggle polling using the HAL settings files? Then you could setup your experiment with stage polling enabled to do the mosaic etc, then Dave could take over using a settings file with polling disabled. I have attempted to implement this here and it seems to be working:

https://github.com/aaronhalpern/storm-control/blob/marztest/storm_control/sc_hardware/marzhauser/marzhauserModule.py

One thing I want to confirm is whether this is indeed the correct way to toggle the polling (line 246):

elif (pname == "polling"):

print('set the polling to ' + str(p.get("polling")))

if p.get("polling"): self.stage_functionality.polling_thread.startPolling() else: self.stage_functionality.polling_thread.stopPolling()

In addition, I have also added the joystick enable/disable parameter in the settings. One of the Marzhauser joysticks drifts even when it not being touched, so disabling it when it is misbehaving would be helpful. I realize these changes do not get at the root cause of failed readline commands as @HazenBabcock https://github.com/HazenBabcock mentioned above, but hopefully the decreased polling might alleviate it somewhat… Anyways your comments would be greatly appreciated if the above sounds reasonable.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ZhuangLab/storm-control/issues/106#issuecomment-804149947, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUJAXGMXBCGPIAXE6CEPLLTE5OORANCNFSM4LCMLBCA .