EastEriq / LAST_Messaging

Messaging framework for the LAST project
0 stars 0 forks source link

Do `SpawnedMatlab`s need to be disowned? #4

Open EastEriq opened 3 months ago

EastEriq commented 3 months ago

Probably so, in order to guarantee that the remote units, if launched silently and not in a terminal, survive if the launching machine crashes or just disconnect. Since they are launched with ssh, one must probably do ssh ....; disown -ar && exit or something the like. Or perhaps use nohup matlab ..... Check. Thinking in particular at superunit run on last0, spawning remote units on the various LAST machines.

EastEriq commented 3 months ago

consider it for https://github.com/PolishookDavid/LAST_OCS/milestone/1

EastEriq commented 3 months ago

What is not ok now: blind units spawned by a superunit survive quitting the matlab which spawned them, but are terminated when logging out (graphically) from the nomachine session in which the superunit matlab was run. Or, if they were started by a -nodesktop session in a ssh shell, survive quitting matlab, but the ssh shell can't complete logout (^C then kills all).

I've tried several permutations of disown -arh in the spawning shell commands but haven't yet come up with the right one.

For SpawnedMatlabs opened with some graphical window, it would also be desirable that the process and its X window survived, if the spawning matlab exited, but the hosting X is still open.

On the long term perhaps we'd want to open a blind Unit master per machine at boot, as a service, and this won't pose the problem of surviving logout, but the current modality should keep working. As an alternative, for debug, etc.

All together, there are 4 terminal types X 2 logging options X 2 MasterMessenger flavors = 16 variations to test....

EastEriq commented 1 month ago

As of today the spawning command for a blind unit is

'ssh -o ConnectTimeout=5 -o PasswordAuthentication=no -o StrictHostKeyChecking=no -fCX ocs@last11e \
   "export LC_CTYPE=en_US.UTF-8;matlab -nosplash -nodesktop \
  -r \"MasterMessenger=obs.util.Listener('10.23.3.21', [],11000);MasterMessenger.start;\"& disown -arh" >/dev/null & \
  disown -arh; echo $?'

the disowns are a committed-by-mistake attempt; they don't help in anything ant their only harm seems to be that they make the code even less readable.

EastEriq commented 1 month ago

This works and survives logout

ssh -f ocs@10.23.1.7 "nohup matlab -nodisplay -r \
  \"MasterMessenger=obs.util.Listener('localhost',[],11000); \
  Unit=obs.unitCS('04'); MasterMessenger.start\" >/dev/null &"

but cannot create its slaves, probably because of the option -X to ssh in SpawnedMatlab.spawn. The >/dev/null is just to remove the clutter, the process apparently survives logout even if stdout is displayed on the terminal

EastEriq commented 1 month ago

Interesting to note, if X is allowed, (e.g. 'silentx'), even just moving around the figure window on the client X, without even resizing or interacting with it, causes a spawned matlab session lag, an no response to the Multipanel queries.

EastEriq commented 1 month ago

so the current spawn command for type 'none' becomes

ssh -o ConnectTimeout=5 -o PasswordAuthentication=no -o StrictHostKeyChecking=no -f ocs@last10e \
  "export LC_CTYPE=en_US.UTF-8;nohup matlab -nosplash -nodesktop -r \
  \"MasterMessenger=obs.util.Listener('10.23.3.21',[],11000);MasterMessenger.start;\"> \
  >(tee -a ~/log/matlab_master10_20240725143443_stdout.log) 2> \
  >(tee -a ~/log/matlab_master10_20240725143443_stderr.log >&2)"; echo $?
EastEriq commented 1 month ago

As for a systemctl service command, it would be good to add also the creation of MasterResponder and Unit, normally done by .connect, so that the the session which comes up is functional for the Multipanel on anyone ready to send commands and queries.