Open Rodgers-PAC-Lab opened 2 years ago
success?? I think that adding self.node.sock.close()
in both the Parent and Child end()
functions might fix this.
https://github.com/Rodgers-PAC-Lab/autopilot/commit/ec78bc95f9751013cb8a0e171e41c8170fb1db5b
along with the aforementioned self.node.router.close()
, which is only needed in the Parent, for the special Net_Node that was initialized with a router_port
.
https://github.com/Rodgers-PAC-Lab/autopilot/commit/a10c0bdc0590debe037984407156624633444273
It seems perhaps there are some dangling sockets left open, and this is causing problems on subsequent session starts. I'll continue testing and if this works I would probably suggest adding self.node.sock.close()
and if self.node.router: self.node.router.close()
within Net_Node.release
here: https://github.com/wehr-lab/autopilot/blob/90956187d4222f16f67ab8b39b8359da954d5dcc/autopilot/networking/node.py#L642
Awesome! yes i have been super sloppy with closing stuff, agree this is super annoying, will revisit after this current release
Figured out a reliable way to kill the IOLoop objects that keep the networking objects open. It's sort of a careful dance of trying to hold threads and processes open with a blocking IOLoop.start()
call but then interact with them through callbacks launched by the on_recv
callback of a ZMQSteam
which are in the same process but different thread. Most of the operations of the IOLoop aren't thread safe, in fact the only one that is is the add_callback
method. So by adding a callback that does this
self.loop.add_callback(lambda:IOLoop.instance().stop())
you can reliably kill them. You're right, also need to close the sockets. I'll work on cleaning this up in v0.5.0, which i'm preparing now.
a few other things I discovered that seem to be important for cleanly shutting down, at least over here. I haven't extensively tested them yet, but I'm just mentioning it now, in case it rings a bell. Now I can cleanly shut down the terminal, and I never have to restart the autopilot process on the Pi between sessions, which wasn't the case for me before in our setup. I can test these out and make PRs after 0.5.0 too.
fairly confident the flush
and close
should be inside the finally
here, else they won't happen some times, resulting in hdf5 files getting left open till the terminal exits
https://github.com/wehr-lab/autopilot/blob/90956187d4222f16f67ab8b39b8359da954d5dcc/autopilot/core/pilot.py#L801
I had to add a self.node.sock.close()
, a self.node.release()
, and a self.networking.release()
right after this line: https://github.com/wehr-lab/autopilot/blob/90956187d4222f16f67ab8b39b8359da954d5dcc/autopilot/core/terminal.py#L952
in order to get the terminal to properly shut down when closing it
I did end up adding a self.sock.close()
here https://github.com/wehr-lab/autopilot/blob/90956187d4222f16f67ab8b39b8359da954d5dcc/autopilot/networking/node.py#L642
Seems like a good thing to do when releasing a node
yes i just arrived at the same changes ;). working my way through some more of the eternal technical debt, thanks for the pointers as well, very helpful.
For a while I've been using a Parent Pi connected to multiple Child Pi (as discussed here: https://github.com/wehr-lab/autopilot/issues/101). This works, but with the annoying problem that the autopilot process on each Child has to be manually quit and restarted between every session, or otherwise we get a ZMQError. I think the Child is not stopping correctly.
Full code for a minimal example (but see also relevant excerpts below if that is easier): The Parent task: https://github.com/Rodgers-PAC-Lab/autopilot/blob/paft2022/autopilot/tasks/paft_parent_child.py The Child task: https://github.com/Rodgers-PAC-Lab/autopilot/blob/b2105803b7bd4e36e39ac1a6f662ec93f0c85c96/autopilot/tasks/children.py#L35
Here is an excerpt of the code in init for the Parent task. The first Net_Node tells the Child to start, and the second Net_Node is used for bidirectional communication with the Child.
Here is the code in the init for the Child task. This Net_Node is used to communicate with the Net_Node of the same name in the Parent.
All of the above actually works! However, if we stop the session and start a new one, then the Parent generates an exception when we try to create
self.node2
again:zmq.error.ZMQError: Address already in use
I've come up with two semi-fixes, neither of which really fixes the whole problem. First, to fix ZMQError, I have this line
self.node2.router.close()
in theend()
function of the Parent task. After this fix, the ZMQError stops happening. Note that it's not sufficient to haveself.node2.release()
, so perhaps closing the router needs to be added as a step in releasing a Net_Node.Second, I explicitly tell the Child class to end, by adding
self.node.send(to=prefs.get('NAME'), key='CHILD', value={'KEY': 'STOP'})
in theend
function of the Parent task. Does this look right? I think this is probably a good thing to do, though I don't see it in the example GoNoGo/Wheel_Child code.Nonetheless, even after these fixes, I haven't yet succeeded in starting a Child task again, without rebooting the autopilot process. Any tips would be much appreciated!!
PS - I think we could also test/debug this in the GoNoGo/Wheel_Child standard task, but that one is not running for me, not sure if I am missing some hardware or what. It says something about a KeyError 'F' in initting some gpio pins.