blumzi / LAST_next_generation

Design and implementation of a new LAST architecture
GNU General Public License v3.0
0 stars 0 forks source link

shorten startup times #5

Closed EastEriq closed 5 months ago

EastEriq commented 7 months ago

It is true that in production lipp is going to start at boot, and be constantly on, so starup delays are not a big issue, but in development they are a nuisance. can something be done about? Look:

date; ~/python/LAST_next_generation/bin/last-unit-server start
Tue 23 Jan 2024 09:10:18 UTC
2024-01-23 09:10:19,089 - INFO     - {last-unit-server:<module>:782741:MainThread:140418314725184} -  calling MATLAB FastApi routers maker with cmd="last-matlab -nodisplay -nosplash -batch 'obs.api.ApiBase.makeAuxiliaryFiles; exit'"

19 seconds later (but I had occurrences where it was minutes - zombies blocking sockets?)

09:10:28.567 [DBG] Adding property: obs.api.Locator

... then

2024-01-23 09:10:38,976 - INFO     - {lipp-unit-mount:wait_for_ready:782741:mount-wait-for-ready-thread:140417555220224} -  started
INFO:     Started server process [782741]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
AstroPack startup addpath count: 33
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
AstroPack startup addpath count: 33
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
09:12:38.176 [INF] [focuser-1] [pid=788020] lipp-driver-focuser-1: ---- Started ----

2:20 have already passed, but we're not done. I think are good to go only at this point:

09:13:26.723 [WRN] [mount-11] [pid=788165] lipp-driver-mount: A 'probe' method, a 'probe' method, my kingdom for a 'probe' method (the 'inst.XerxesMountBinary' class needs a 'probe' method)

which is 3:07 after startup. Btw, mount has a .probe implemented, but I don't understand what else is going on. There are errors on the way, which deserve separate tickets.

blumzi commented 7 months ago

So, I understand the delays are something that I deliberately inserted and are very annoying !

Sorry to disappoint but that's how long it takes for the matlab processes to become responsive.

If you have any suggestions on how to make them faster (we talked previously about, for example, shortening the addpath sequence, but came to the conclusion that it's a significant effort with minimal gain), please go ahead!

Another way would be not to start all the LIPP processes, just the ones you want to work on. That will require you to get involved in the Python code.

If it's not clear when things are ready for remote access, please let me know and we'll talk about it.

EastEriq commented 6 months ago

Timings have changed since the last time I was at it, but there is still a two minutes bottleneck:

2024-03-14 10:53:56,875 - INFO     - {last-unit-server:<module>:503067:MainThread:139912242399040} -  calling MATLAB FastApi routers maker with cmd="last-matlab -nodisplay -nosplash -batch 'obs.api.ApiBase.makeAuxiliaryFiles; exit'"
2024-03-14 10:53:56,878 - INFO     - {last-unit-server:<module>:503067:MainThread:139912242399040} -  Waiting for MATLAB FastApi routers maker
AstroPack startup addpath count: 34
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
AstroPack startup addpath count: 34
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
10:56:03.204 [INF] [camera-1] [pid=503102] lipp-driver-camera-1: ---- Started ----
10:56:03.252 [INF] [focuser-1] [pid=503185] lipp-driver-focuser-1: ---- Started ----
creating UNIX socket, path=lipp-driver-camera-1, timeout=5.0
10:56:03.272 [INF] [camera-1] [pid=503102] lipp-driver-camera-1: created POSIX socket path='lipp-driver-camera-1', timeout=5

I don't understand yet enough of the mechanics to inspect myself, but is the FastApi routers maker really supposed to take that much? Astropack startup should only account for a few additional seconds, not minutes.

EastEriq commented 6 months ago

IIUC the making of routers is (only?) https://github.com/blumzi/LAST_next_generation/blob/39ecbd7aa13c39959ca6fa26501b1e9cf7b35498/unit/unit-server.py#L28 . Now as a matter of fact

>> tic;obs.api.ApiBase.makeAuxiliaryFiles, toc
Elapsed time is 2.496747 seconds.

inside matlab, and

ocs@last11e:~/python/LAST_next_generation$ time last-matlab -nodisplay -nosplash -batch 'obs.api.ApiBase.makeAuxiliaryFiles; exit'
AstroPack startup addpath count: 34
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup

real    0m27.916s
user    0m28.804s
sys 0m1.844s

including all the matlab startup. That is still not two minutes. [-nodisplay -nosplash and exit btw are not required, since -batch]

EastEriq commented 5 months ago

On 1w that part of the process takes only 10 sec. Something is not consequent here

2024-03-21 14:31:30,834 - INFO     - {lipp-unit-mount:wait_for_ready:505363:mount-wait-for-ready-thread:140596928820992} -  started
2024-03-21 14:31:31,915 - INFO     - {last-unit-server:<module>:505363:MainThread:140597539637056} -  calling MATLAB FastApi routers maker with cmd="last-matlab -nodisplay -nosplash -batch 'obs.api.ApiBase.makeAuxiliaryFiles; exit'"
2024-03-21 14:31:31,918 - INFO     - {last-unit-server:<module>:505363:MainThread:140597539637056} -  Waiting for MATLAB FastApi routers maker
AstroPack startup addpath count: 33
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
AstroPack startup addpath count: 33
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
AstroPack startup addpath count: 33
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
AstroPack startup addpath count: 33
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
AstroPack startup addpath count: 33
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
AstroPack startup addpath count: 33
AstroPack startup done: /home/ocs/matlab/AstroPack/matlab/startup/startup
14:31:40.395 [INF] [focuser-4] [pid=507111] lipp-driver-focuser-4: ---- Started ----
14:31:40.407 [INF] [camera-3] [pid=506974] lipp-driver-camera-3: ---- Started ----
14:31:40.419 [INF] [focuser-3] [pid=507084] lipp-driver-focuser-3: ---- Started ----
blumzi commented 5 months ago

Attempted to shorten load time by creating a prefdir/pathdef.m. It sometimes works (got 25 seconds instead of 2,5 minutes), but sometimes NOT :-(

EranOfek commented 5 months ago

In case you don't know this - check out: https://www.mathworks.com/matlabcentral/answers/717043-how-can-i-speed-up-my-matlab-startup-time

blumzi commented 5 months ago

much faster now

EastEriq commented 5 months ago

Uhm, after fb0c1ace5 startup on last11e is fast, but on 11w....

ocs@last11w:~/matlab/LAST/LAST_Handle$ ~/python/LAST_next_generation/bin/last-unit-server start
2024-03-24 16:25:59,299 - INFO     - {lipp-unit-camera-3:start_driver_process:1560475:MainThread:139869116245824} -  >>> Starting driver process, reason='first-time', self.cmd=['/usr/local/bin/matlab', '-batch', "obs.api.Lipp('EquipmentName', 'camera', 'EquipmentId', 3).loop()"], env={'NX_TEMP': '/tmp', 'SHELL': '/bin/bash', 'SESSION_MANAGER': 'local/last11w:@/tmp/.ICE-unix/11898,unix/last11w:/tmp/.ICE-unix/11898', 'COLORTERM': 'truecolor', 'XDG_CONFIG_DIRS': '/etc/xdg/xdg-ubuntu-xorg:/etc/xdg', 'ANSI_NORMAL': '\x1b[0;0m', 'LESS': '-ceFi', 'NX_SESSION_ID': '5A84166779D37AE1DDF2C19C84DDDC0B', 'no_proxy': '127.0.0.1,10.23.1.1,10.23.1.2,10.23.1.3,10.23.1.4,10.23.1.5,10.23.1.6,10.23.1.7,10.23.1.8,10.23.1.9,10.23.1.10,10.23.1.11,10.23.1.12,10.23.1.13,10.23.1.14,10.23.1.15,10.23.1.16,10.23.1.17,10.23.1.18,10.23.1.19,10.23.1.20,10.23.1.21,10.23.1.22,10.23.1.23,10.23.1.24,10.23.1.25,10.23.2.1,10.23.2.2,10.23.2.3,10.23.2.4,10.23.2.5,10.23.2.6,10.23.2.7,10.23.2.8,10.23.2.9,10.23.2.10,10.23.2.11,10.23.2.12,10.23.2.13,10.23.2.14,10.23.2.15,10.23.2.16,10.23.2.17,10.23.2.18,10.23.2.19,10.23.2.20,10.23.2.21,10.23.2.22,10.23.2.23,10.23.2.24,10.23.2.25,10.23.3.1,10.23.3.2,10.23.3.3,10.23.3.4,10.23.3.5,10.23.3.6,10.23.3.7,10.23.3.8,10.23.3.9,10.23.3.10,10.23.3.11,10.23.3.12,10.23.3.13,10.23.3.14,10.23.3.15,10.23.3.16,10.23.3.17,10.23.3.18,10.23.3.19,10.23.3.20,10.23.3.21,10.23.3.22,10.23.3.23,10.23.3.24,10.23.3.25,10.23.0.3,10.23.0.4,10.23.0.5,10.23.0.100,10.23.0.101,10.23.0.102,10.23.0.103', 'XDG_MENU_PREFIX': 'gnome-', 'ANSI_NORMAL_RED': '\x1b[0;31m', 'GNOME_DESKTOP_SESSION_ID': 'this-is-deprecated', 'NX_CUPS_BIN': '/usr/bin', 'ANSI_NORMAL_GREEN': '\x1b[0;32m', 'PULSE_RUNTIME_PATH': '/run/user/1001/pulse', 'LANGUAGE': 'en_IL:en', 'WGET_OPTIONS': '--output-file=/dev/null --quiet --timeout=1 --tries=1', 'ANSI_BG_BLACK': '\x1b[40m', 'GNOME_SHELL_SESSION_MODE': 'ubuntu', 'SSH_AUTH_SOCK': '/run/user/1001/keyring/ssh', 'ANSI_REVERSE': '\x1b[7m', 'LAST_TOOL_ROOT': '/usr/local/share/last-tool', 'ANSI_NORMAL_MAGENTA': '\x1b[0;35m', 'NX_ROOT': '/home/ocs/.nx', 'XMODIFIERS': '@im=ibus', 'DESKTOP_SESSION': 'ubuntu-xorg', 'ANSI_FLASH': '\x1b[5m', 'LAST_TOOL_DONTLOG': 'false', 'PWD': '/home/ocs/python/LAST_next_generation', 'LOGNAME': 'ocs', 'XDG_SESSION_TYPE': 'x11', 'ANSI_BRIGHT_MAGENTA': '\x1b[1;35m', 'NX_CLIENT': '/usr/NX/bin/nxclient', 'ANSI_NORMAL_BLUE': '\x1b[0;34m', 'ANSI_BG_GREEN': '\x1b[42m', 'XAUTHORITY': '/home/ocs/.Xauthority', 'PULSE_SCRIPT': '/home/ocs/.nx/nxdevice/D-1002-5A84166779D37AE1DDF2C19C84DDDC0B/audio/default.pa', 'GJS_DEBUG_TOPICS': 'JS ERROR;JS LOG', 'ANSI_BG_RED': '\x1b[41m', 'HOME': '/home/ocs', 'USERNAME': 'ocs', 'LANG': 'en_US', 'XDG_CURRENT_DESKTOP': 'ubuntu:GNOME', 'VIRTUAL_ENV': '/home/ocs/python/LAST_next_generation/.venv', 'VTE_VERSION': '6003', 'ANSI_BRIGHT_GREEN': '\x1b[1;32m', 'GNOME_TERMINAL_SCREEN': '/org/gnome/Terminal/screen/5cba0934_689c_4396_9295_bdcaea695135', 'https_proxy': 'http://bcproxy.weizmann.ac.il:8080', 'ANSI_BRIGHT_WHITE': '\x1b[1;37m', 'PULSE_CLIENTCONFIG': '/home/ocs/.nx/nxdevice/D-1002-5A84166779D37AE1DDF2C19C84DDDC0B/audio/client.conf', 'NX_CONNECTION': '132.77.39.181 52436 10.23.3.22 4000 ', 'ANSI_NORMAL_BLACK': '\x1b[0;30m', 'ANSI_NORMAL_CYAN': '\x1b[0;36m', 'NX_SYSTEM': '/usr/NX', 'GJS_DEBUG_OUTPUT': 'stderr', 'XDG_SESSION_CLASS': 'user', 'TERM': 'xterm-256color', 'ANSI_BG_YELLOW': '\x1b[43m', 'USER': 'ocs', 'LAST_MODULE_INCLUDE_PATH': '/usr/local/share/last-tool', 'ANSI_NORMAL_YELLOW': '\x1b[0;33m', 'GNOME_TERMINAL_SERVICE': ':1.89', 'ANSI_BRIGHT_CYAN': '\x1b[1;36m', 'DISPLAY': ':1002', 'SHLVL': '2', 'ANSI_BREAK': '\x1b[0;01m', 'ANSI_BRIGHT_YELLOW': '\x1b[1;33m', 'QT_IM_MODULE': 'ibus', 'VIRTUAL_ENV_PROMPT': '.venv', 'http_proxy': 'http://bcproxy.weizmann.ac.il:8080', 'LAST_TOOL_QUIET': 'false', 'XDG_RUNTIME_DIR': '/run/user/1001', 'ANSI_BG_WHITE': '\x1b[47m', 'PS1': '(.venv) ', 'PS4': '+ $(d=$(date --rfc-3339=ns); d=${d/ /@}; echo ${d:0:23}) [$SHLVL,$BASH_SUBSHELL] [${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]:-main}] ', 'ANSI_BG_BLUE': '\x1b[44m', 'ANSI_BRIGHT_BLACK': '\x1b[1;30m', 'ANSI_BG_CYAN': '\x1b[46m', 'PULSE_CONFIG': '/home/ocs/.nx/nxdevice/D-1002-5A84166779D37AE1DDF2C19C84DDDC0B/audio/daemon.conf', 'XDG_DATA_DIRS': '/usr/share/ubuntu-xorg:/usr/local/share:/usr/share:/var/lib/snapd/desktop', 'GDK_BACKEND': 'x11', 'PATH': '/home/ocs/python/LAST_next_generation/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:MATLABROOT/bin', 'ANSI_NORMAL_WHITE': '\x1b[0;37m', 'MATLABROOT': '/usr/local/MATLAB/R2020b/bin', 'DBUS_SESSION_BUS_ADDRESS': 'unix:abstract=/tmp/dbus-EuSfVdKzeT,guid=f08297900ce6467e55f45f7565f2be78', 'ANSI_UNDERLINE': '\x1b[4m', 'MAIL': '/var/mail/ocs', 'PULSE_SERVER': '/home/ocs/.nx/nxdevice/D-1002-5A84166779D37AE1DDF2C19C84DDDC0B/audio/native.socket', 'ANSI_BG_MAGENTA': '\x1b[45m', 'ANSI_BRIGHT_RED': '\x1b[1;31m', 'NXDIR': '/usr/NX', 'OLDPWD': '/home/ocs/matlab/LAST/LAST_Handle', 'ANSI_BRIGHT_BLUE': '\x1b[1;34m', '_': '/home/ocs/python/LAST_next_generation/.venv/bin/python3', 'FROM_PYTHON_LIPP': '1'}
Traceback (most recent call last):
  File "unit/unit-server.py", line 13, in <module>
    from unit import unit_quit, unit_router
  File "/home/ocs/python/LAST_next_generation/unit/unit.py", line 13, in <module>
    from server.routers.camera import cameras
  File "/home/ocs/python/LAST_next_generation/unit/server/routers/camera.py", line 36, in <module>
    lipp.Driver(cameras, Equipment.Camera, id)
  File "/home/ocs/python/LAST_next_generation/unit/lipp.py", line 140, in __init__
    self.start_driver_process(reason='first-time')
  File "/home/ocs/python/LAST_next_generation/unit/lipp.py", line 149, in start_driver_process
    self.driver_process = Popen(args=self.cmd, env=env)
  File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/matlab'
2024-03-24 16:25:59,332 - INFO     - {lipp:resurrect:1560475:MainThread:139869116245824} -  >>> Resurrecting LIPP driver Camera[3] ...
Exception ignored in: <function Driver.__del__ at 0x7f35ced670d0>
Traceback (most recent call last):
  File "/home/ocs/python/LAST_next_generation/unit/lipp.py", line 369, in __del__
  File "/home/ocs/python/LAST_next_generation/unit/lipp.py", line 406, in resurrect
  File "/usr/lib/python3.8/logging/__init__.py", line 1446, in info
  File "/usr/lib/python3.8/logging/__init__.py", line 1589, in _log
  File "/usr/lib/python3.8/logging/__init__.py", line 1599, in handle
  File "/usr/lib/python3.8/logging/__init__.py", line 1661, in callHandlers
  File "/usr/lib/python3.8/logging/__init__.py", line 954, in handle
  File "/home/ocs/python/LAST_next_generation/unit/utils.py", line 128, in emit
  File "/home/ocs/python/LAST_next_generation/unit/utils.py", line 114, in make_file_name
ImportError: sys.meta_path is None, Python is likely shutting down

reopening...

EastEriq commented 5 months ago

Trivially, 11e and 11w differ on which matlab:

ocs@last11e:~$ which matlab
/usr/local/bin/matlab
ocs@last11e:~$ ls -l /usr/local/bin/matlab*
lrwxrwxrwx 1 root root 35 Mar  9  2021 /usr/local/bin/matlab -> /usr/local/MATLAB/R2020b/bin/matlab
lrwxrwxrwx 1 root root 35 Oct  6  2022 /usr/local/bin/matlab-R2022a -> /usr/local/MATLAB/R2022a/bin/matlab
ocs@last11w:~$ which matlab
/usr/bin/matlab
ocs@last11w:~$ ls /usr/local/bin/m*
/usr/local/bin/matlab-R2020b
ocs@last11w:~$ ls /usr/bin/mat*
/usr/bin/matlab  /usr/bin/matlab-mbuild  /usr/bin/matlab-mex  /usr/bin/mattrib
ocs@last11w:~$ ls -l /usr/local/bin/mat*
lrwxrwxrwx 1 root root 35 Feb  6  2023 /usr/local/bin/matlab-R2020b -> /usr/local/MATLAB/R2020b/bin/matlab

Maybe this is an artefact of having installed 2022a only on 11e. ln -s /usr/local/bin/matlab-R2020b /usr/local/bin/matlab solves the issue, but let's remember it.