PolishookDavid / LAST_OCS

Code controling the LAST project Observatory
0 stars 0 forks source link

Shutdown can't disconnect slaves because "there are still processes listening on udp port 8502" also causes problems when reconnecting #17

Closed noralinn closed 5 months ago

noralinn commented 5 months ago

Issue on last08e see below on branch j2000inheader updated to the most recent version.

In addition, I had the java error that kills matlab after 5min, it appeared when running astrometry. Not sure if those are connected.

>> Unit.shutdown
{obs.unitCS}   parking the mount...
{obs.unitCS}   disconnecting devices and slave sessions...
{obs.util.SpawnedMatlab} there are still processes listening on udp port 8502
{obs.util.SpawnedMatlab} killing process 183732
bash: line 0: kill: (183732) - No such process
{obs.util.SpawnedMatlab} there are still processes listening on udp port 8503
{obs.util.SpawnedMatlab} killing process 25664
bash: line 0: kill: (25664) - No such process
{obs.util.SpawnedMatlab} there are still processes listening on udp port 8504
{obs.util.SpawnedMatlab} killing process 25918
bash: line 0: kill: (25918) - No such process
{obs.unitCS}   powering off cameras and mount

ans = 

  unitCS with properties:

         PowerSwitch: {1×2 cell}
               Mount: [1×1 inst.XerxesMountBinary]
          MountPower: 0
              Camera: {1×4 cell}
         CameraPower: [0 0 0 0]
             Focuser: {1×4 cell}
     LocalTelescopes: []
    RemoteTelescopes: {[1]  [2]  [3]  [4]}
               Slave: {1×4 cell}
         Temperature: [22.5 21.8]

>> Unit.connect
{inst.XerxesMountBinary} Trying to connect to inst.XerxesMountBinary at /dev/ttyS5
{inst.XerxesMountBinary} Xerxes mount found on /dev/ttyS5
{inst.XerxesMountBinary} Loading post connection configuration for Mount 08_1
{obs.unitCS} spawning slave 1
{obs.util.SpawnedMatlab} destination ports already used by PID 178382 on last08e
{obs.util.SpawnedMatlab} maybe .connect instead? if not, kill these processes first
{obs.unitCS} spawning slave 2
{obs.unitCS} spawning slave 3
{obs.unitCS} spawning slave 4
{obs.util.Messenger} 08_slave_2.Messenger timed out waiting for a reply to "Unit=obs.unitCS('08_slave_2');"

ans = 

  unitCS with properties:

         PowerSwitch: {1×2 cell}
               Mount: [1×1 inst.XerxesMountBinary]
          MountPower: 1
              Camera: {1×4 cell}
         CameraPower: [1 1 1 1]
             Focuser: {1×4 cell}
     LocalTelescopes: []
    RemoteTelescopes: {[1]  [2]  [3]  [4]}
               Slave: {1×4 cell}
         Temperature: [22.5 22]

>> Unit.checkWholeUnit(0,1)
{obs.unitCS} Checking definitions and connections of unit 08:
{obs.unitCS} Slave 1 status: "disconnected"
{obs.unitCS} creation of the slave 1 anew
{obs.unitCS} spawning slave 1
{obs.util.SpawnedMatlab} destination ports already used by PID 178382 on last08e
{obs.util.SpawnedMatlab} maybe .connect instead? if not, kill these processes first
{obs.unitCS} Slave 2 status: "alive"
{obs.unitCS} Slave 3 status: "alive"
{obs.unitCS} Slave 4 status: "alive"
{obs.unitCS} mount is powered
{obs.unitCS} checking status of camera 1
{obs.remoteClass} invalid or uninitialized remote class 
{obs.unitCS} retrieved no camera 1 status
{obs.remoteClass} invalid or uninitialized remote class 
{obs.remoteClass} invalid or uninitialized remote class 
{obs.unitCS} trying plain reconnect of camera 1
{obs.remoteClass} invalid or uninitialized remote class 
{obs.remoteClass} invalid or uninitialized remote class 
{obs.unitCS} retrieved no camera 1 status
{obs.remoteClass} invalid or uninitialized remote class 
{obs.remoteClass} invalid or uninitialized remote class 
{obs.unitCS} trying power cycle and reconnect of camera 1
{obs.remoteClass} invalid or uninitialized remote class 
{obs.remoteClass} invalid or uninitialized remote class 
{obs.unitCS} retrieved no camera 1 status
{obs.remoteClass} invalid or uninitialized remote class 
{obs.remoteClass} invalid or uninitialized remote class 
{obs.unitCS} checking status of camera 2
{obs.unitCS} camera 2 is idle, good
{obs.unitCS} checking status of camera 3
{obs.unitCS} camera 3 status is unknown, bad sign
{obs.unitCS} camera QHY600M-976b2c47afb86651b is not even known registered on the computer
{obs.unitCS} check if the camera is physically connected and powered,
{obs.unitCS}   or otherwise check that the obs.camera configuration file is correct
{obs.unitCS} trying plain reconnect of camera 3
{obs.unitCS} camera 3 status is unknown, bad sign
{obs.unitCS} camera QHY600M-976b2c47afb86651b is not even known registered on the computer
{obs.unitCS} check if the camera is physically connected and powered,
{obs.unitCS}   or otherwise check that the obs.camera configuration file is correct
{obs.unitCS} trying power cycle and reconnect of camera 3
{obs.unitCS} camera 3 status is unknown, bad sign
{obs.unitCS} camera QHY600M-976b2c47afb86651b is not even known registered on the computer
{obs.unitCS} check if the camera is physically connected and powered,
{obs.unitCS}   or otherwise check that the obs.camera configuration file is correct
{obs.unitCS} checking status of camera 4
{obs.unitCS} camera 4 is idle, good
{obs.unitCS} anomalous gain value of 4294967295.000000 means something fishy
{obs.unitCS} trying plain reconnect of camera 4
{obs.unitCS} camera 4 is idle, good
{obs.remoteClass} invalid or uninitialized remote class 
{obs.remoteClass} invalid or uninitialized remote class 
{obs.unitCS} Focuser 1 status not retrieved
{obs.unitCS} focuser 2 check passed
{obs.unitCS} focuser 3 check passed
{obs.unitCS} focuser 4 check passed
{obs.unitCS} check failed, even after remediation!

ans =

  logical

   0

>> 
EastEriq commented 5 months ago

This would be my explanation.

First:

{obs.util.SpawnedMatlab} there are still processes listening on udp port 8502
{obs.util.SpawnedMatlab} killing process 183732
bash: line 0: kill: (183732) - No such process

etc. -- this is a bogus report, innocuous. It is cured by https://github.com/EastEriq/LAST_Messaging/commit/948a6ae91eac8ac7b21b69ed0a362f663e45cd32 and https://github.com/EastEriq/LAST_Messaging/commit/336816d3182c276f657570f73cf20f46449f7e30 which I did yesterday, but hadn't yet pulled on last08e/w.

Then:

>> Unit.connect
...
{obs.unitCS} spawning slave 1
{obs.util.SpawnedMatlab} destination ports already used by PID 178382 on last08e
{obs.util.SpawnedMatlab} maybe .connect instead? if not, kill these processes first

is a bit puzzling because PID 178382 supposedly was terminated by the previous shutdown. I imagine that might be a residual glitch of what I corrected with those last commits to LAST_Messaging.

On the other hand, you mentioned on whatsapp that you ran into https://github.com/blumzi/LAST_issues/issues/16 . If you did when opening focuser 1, I imagine that a) the slave remained stuck (but locking the udp port) for 5 minutes, b) the uncorrected shutdown didn't really kill it. That would then explain why checkWholeUnit denounced

{obs.unitCS} Slave 1 status: "disconnected"
{obs.unitCS} creation of the slave 1 anew
{obs.unitCS} spawning slave 1
{obs.util.SpawnedMatlab} destination ports already used by PID 178382 on last08e
{obs.util.SpawnedMatlab} maybe .connect instead? if not, kill these processes first

and having failed to either connect to the existing slave (which is stuck for 5 minutes) or to spawn a new one (which conservatively it doesn't do, at least in the old version, not to increase the mess), this is why neither camera 1 nor focuser 1 are subsequently found.

As for

{obs.unitCS} camera 3 status is unknown, bad sign
{obs.unitCS} camera QHY600M-976b2c47afb86651b is not even known registered on the computer

and

{obs.unitCS} camera 4 is idle, good
{obs.unitCS} anomalous gain value of 4294967295.000000 means something fishy

I think we are in the general category of "shit happens". (did I already say what happens when you buy chinese cameras?)

TL;DR -- Simone rebooted the machine, pulled the latest LAST_Messaging and didn't observe problems afterwards -- I would consider the issue closed, but please reopen it if you see it isn't.