Closed EranOfek closed 2 months ago
last-pipeline is a systemctld service. the service can be configured to expect a notification at a pre-defined time interval and restart the service if not notified in time. To achieve this we need the pipeline to call the _sdnotify C library function periodically. If you can call this function periodically within the pipeline, I can add the implementation.
Do you mean perhaps shelling in matlab system systemd-notify --whatever_options
, instead of calling a .so library function? Interfacing with a library is possible in matlab, we do it all the time with SDKs, but involves complication.
https://askubuntu.com/questions/1120023/how-to-use-systemd-notify
That's possible as well, somewhat less accurate and will open an additional process, but we can use it.
On Fri, Jan 5, 2024 at 11:13 AM EastEriq @.***> wrote:
Do you mean perhaps shelling in matlab system systemd-notify --whatever_options, instead of calling a .so library function? Interfacing with a library is possible in matlab, we do it al the time with SDKs, but involves complication. https://askubuntu.com/questions/1120023/how-to-use-systemd-notify
— Reply to this email directly, view it on GitHub https://github.com/blumzi/LAST_issues/issues/13#issuecomment-1878352115, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFHQOISQTMC67E6SOOPKILYM674VAVCNFSM6AAAAABBOA3YQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZYGM2TEMJRGU . You are receiving this because you commented.Message ID: @.***>
Pushed the following solution:
As discussed with Eran
last-pipeline1
and last-pipeline2
), one per DataDir. Each will be individually be monitored by systemd
.Added to AstroPack capability to send sd_notify
messages
last-pipeline[12]
services gets started, the bash script intrinsically calls tools.systemd.mex.notify_ready
which informs systemd:tools.systemd.mex.notify_watchdog
at intervals of less than 1800 seconds. It can know if it was ran by systemd by checking the existence of the environment variable SYSTEMD, but the notify_xxx
functions will do nothing if it is not set (so they're safe-to-call in a regular matlab session)The systemd
service files (/etc/systemd/system/last-pipeline[12]
) now look as follows:
[Unit]
Description=LAST pipeline service (1 of 2)
[Service] User=ocs WorkingDirectory=/home/ocs/matlab ExecStart=/usr/local/share/last-tool/bin/last-pipeline start 1 ExecStop=/usr/local/share/last-tool/bin/last-pipeline stop 1 Restart=always Environment="SYSTEMD=1" WatchdogSec=1800
[Install] WantedBy=multi-user.target
* The (new) tool `last-pipeline` can now show the status of the last-pipeline services. The following is an example of both not running:
```bash
ocs@last12w:/home/ocs# last-pipeline status
Unit last-pipeline1.service could not be found.
Unit last-pipeline2.service could not be found.
Thanks - can you move this function to the tools.os
On Tue, Jan 16, 2024 at 1:35 PM Arie Blumenzweig @.***> wrote:
Pushed the following solution:
-
As discussed with Eran
- We will have two systemd services (i.e. last-pipeline1 and last-pipeline2), one per DataDir. Each will be individually be monitored by systemd.
- We may have more in the future (e.g. PAST)
Added to AstroPack capability to send sd_notify messages
- When each of the last-pipeline[12] services gets started, the bash script intrinsically calls tools.systemd.mex.notify_ready which informs systemd:
- That the service is ready and will start it's main workload
- What process ID needs to be monitored
- The pipeline (matlab) code is responsible to call tools.systemd.mex.notify_watchdog at intervals of less than 1800 seconds. It can know if it was ran by systemd by checking the existence of the environment variable SYSTEMD, but the notify_xxx functions will do nothing if it is not set (so they're safe-to-call in a regular matlab session)
The systemd service files (/etc/systemd/system/last-pipeline[12]) now look as follows:
[Unit] Description=LAST pipeline service (1 of 2)
[Service] User=ocs WorkingDirectory=/home/ocs/matlab ExecStart=/usr/local/share/last-tool/bin/last-pipeline start 1 ExecStop=/usr/local/share/last-tool/bin/last-pipeline stop 1 Restart=always Environment="SYSTEMD=1" WatchdogSec=1800
[Install] WantedBy=multi-user.target
- The (new) tool last-pipeline can now show the status of the last-pipeline services. The following is an example of both not running:
@.***:/home/ocs# last-pipeline status Unit last-pipeline1.service could not be found. Unit last-pipeline2.service could not be found.
— Reply to this email directly, view it on GitHub https://github.com/blumzi/LAST_issues/issues/13#issuecomment-1893566920, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUQ4PXH3IHU2ZQ4QQSXUDYOZQWLAVCNFSM6AAAAABBOA3YQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGU3DMOJSGA . You are receiving this because you authored the thread.Message ID: @.***>
last-pipeline1/2 is now a service
The problem: MATLAB stack, so the process still allives but not operational. Suggested solution: The pipeline will write a status file (where?) every <30 min. A crontab script will check when this file was last updated and if >40min, then will kill the matlab process. Next, the pipeline service will start a new process.