UCLA-Rocket-Project / OLD-Ares2022-2023

Central Repository for Ares Software
3 stars 0 forks source link

Develop script to automate restart of RPi services. #41

Closed harrisonCassar closed 1 year ago

harrisonCassar commented 1 year ago

Motivation

Currently, we have to manually type in commands to restart services when they go down/we need a reset. This isn't too bad for someone who knows what they're doing or has a reference/cheat-sheet, but we should make this process a bit easier for an operator.

Perhaps a Bash script or Python script that is executable at the command-line would be best! This will also be helpful for supporting the deployment infrastructure implemented with #16.

Initial Notes

Relevant Services

In the current implementation, we have a number of relevant services:

Relevant Service Commands

# To list all (enabled) system services
systemctl list-unit-files | grep enabled

# To get the status of a service
systemctl status NAME.service

# To restart a service
systemctl restart NAME.service

# To stop a service
systemctl stop NAME.service

# To start a service
systemctl start NAME.service

Implemented Interface

Help message:

rocket@rocket:~ $ python3 ./manage_services.py -h
usage: manage_services.py [-h] [-i ID [ID ...]] [--log LOG] [--log-level LOG_LEVEL] OP [GROUP ...]

Manage Ground Systems' RPi server's Linux services empowering all Controls/DAQ subsystem functionalities.

positional arguments:
  OP                    Operation to perform on specified Services. Possible values: status (Show runtime status
                        information about service), start (Start (activate) service), stop (Stop (deactivate)
                        service), restart (Stop, and then start service. If a service was not yet running already, it
                        will be started), try-restart (Stop, and then start service. If a service is not yet running
                        already, this operation will do nothing).
  GROUP                 Service Group IDs of interest to operate on. Possible values: data (Services that are related
                        to near-device data collection/processing Includes: adc, cctv, tc), server (Services that are
                        related to the running of the server and its facilitation responsibilities Includes: grafana,
                        grafana-image, server-flask, server-telegraf), all (Encompasses ALL services Includes: adc,
                        cctv, grafana, grafana-image, tc, server-flask, server-telegraf).

optional arguments:
  -h, --help            show this help message and exit
  -i ID [ID ...], --individual ID [ID ...]
                        Individual Service API IDs of interest to operate on. Possible values: adc (Run the ADC binary
                        to poll, process, and log all data acquired by the ADC), cctv (Performs live-streaming of CCTV
                        camera video output), grafana (Runs the Grafana front-end server), grafana-image (Runs a
                        simple Flask server for images to be displayed on our main Grafana front-end dashboards), tc
                        (Run a Python script that processes and logs all data acquired by the Thermocouples), server-
                        flask (Runs our main Flask server that facilitates communication between the Pad and Bunker
                        for the Controls and DAQ subsystems), server-telegraf (Runs Telegraf to mediate the flow of
                        data (not currently actively used)).
  --log LOG             Path to file for logging output (instead of only stdout).
  --log-level LOG_LEVEL
                        Provide logging level (i.e. DEBUG, INFO, WARNING, etc.). Default: INFO

We support the above-noted services, as well as the following commands (based on the Linux man page for systemctl): status, start, stop, restart, try-restart.

We also support applying operations onto pre-defined "groups" of services, grouped by general functionality.

Dependencies

Blocked by #36. Blocks #16.

harrisonCassar commented 1 year ago

Interactive session:

rocket@rocket:~ $ python3 ./manage_services.py -h
usage: manage_services.py [-h] [-i ID [ID ...]] [--log LOG] [--log-level LOG_LEVEL] OP [GROUP ...]

Manage Ground Systems' RPi server's Linux services empowering all Controls/DAQ subsystem functionalities.

positional arguments:
  OP                    Operation to perform on specified Services. Possible values: status (Show runtime status
                        information about service); start (Start (activate) service); stop (Stop (deactivate)
                        service); restart (Stop, and then start service. If a service was not yet running already, it
                        will be started); try-restart (Stop, and then start service. If a service is not yet running
                        already, this operation will do nothing).
  GROUP                 Service Group IDs of interest to operate on. Possible values: data (Services that are related
                        to near-device data collection/processing Includes: adc, cctv, tc); server (Services that are
                        related to the running of the server and its facilitation responsibilities Includes: grafana,
                        grafana-image, server-flask, server-telegraf); all (Encompasses ALL services Includes: adc,
                        cctv, grafana, grafana-image, tc, server-flask, server-telegraf).

optional arguments:
  -h, --help            show this help message and exit
  -i ID [ID ...], --individual ID [ID ...]
                        Individual Service API IDs of interest to operate on. Possible values: adc (Run the ADC binary
                        to poll, process, and log all data acquired by the ADC); cctv (Performs live-streaming of CCTV
                        camera video output); grafana (Runs the Grafana front-end server); grafana-image (Runs a
                        simple Flask server for images to be displayed on our main Grafana front-end dashboards); tc
                        (Run a Python script that processes and logs all data acquired by the Thermocouples); server-
                        flask (Runs our main Flask server that facilitates communication between the Pad and Bunker
                        for the Controls and DAQ subsystems); server-telegraf (Runs Telegraf to mediate the flow of
                        data (not currently actively used)).
  --log LOG             Path to file for logging output (instead of only stdout).
  --log-level LOG_LEVEL
                        Provide logging level (i.e. DEBUG, INFO, WARNING, etc.). Default: INFO
rocket@rocket:~ $ python3 ./manage_services.py status -i image
usage: manage_services.py [-h] [-i ID [ID ...]] [--log LOG] [--log-level LOG_LEVEL] OP [GROUP ...]
manage_services.py: error: argument -i/--individual: image is not a valid API ID for a supported Service.
rocket@rocket:~ $ python3 ./manage_services.py status -i cctv
● cctv.service - CCTV
     Loaded: loaded (/etc/systemd/system/cctv.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-05-05 01:54:17 PDT; 15h ago
   Main PID: 423 (main)
      Tasks: 3 (limit: 8986)
        CPU: 22.559s
     CGroup: /system.slice/cctv.service
             └─423 ustreamer --port=4567 --host=0.0.0.0 --device=/dev/video0

May 05 01:54:17 rocket cctv.service[423]: -- INFO  [7.949      main] -- Using internal blank placeholder
May 05 01:54:17 rocket cctv.service[423]: -- INFO  [7.950      main] -- Listening HTTP on [0.0.0.0]:4567
May 05 01:54:17 rocket cctv.service[423]: -- INFO  [7.950    stream] -- Using V4L2 device: /dev/video0
May 05 01:54:17 rocket cctv.service[423]: -- INFO  [7.951    stream] -- Using desired FPS: 0
May 05 01:54:17 rocket cctv.service[423]: -- INFO  [7.951      http] -- Starting HTTP eventloop ...
May 05 01:54:17 rocket cctv.service[423]: =============================================================================>
May 05 01:54:17 rocket cctv.service[423]: -- ERROR [7.952    stream] -- Can't access device: No such file or directory
May 05 01:54:17 rocket cctv.service[423]: -- INFO  [7.952    stream] -- Waiting for the device access ...
May 05 01:54:17 rocket systemd[1]: Started CCTV.
qrocket@rocket:~ $python3 ./manage_services.py restart -i adc tc
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to restart 'adc.service'.
Authenticating as: ,,, (rocket)
Password:
==== AUTHENTICATION COMPLETE ===
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to restart 'tc.service'.
Authenticating as: ,,, (rocket)
Password:
==== AUTHENTICATION COMPLETE ===
rocket@rocket:~ $ python3 ./manage_services.py status -i adc tc
● adc.service - ADC
     Loaded: loaded (/etc/systemd/system/adc.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) since Fri 2023-05-05 17:23:25 PDT; 3s ago
    Process: 7786 ExecStart=/home/rocket/binaries/adc (code=exited, status=0/SUCCESS)
   Main PID: 7786 (code=exited, status=0/SUCCESS)
        CPU: 9ms

May 05 17:23:25 rocket systemd[1]: adc.service: Succeeded.
May 05 17:23:29 rocket systemd[1]: adc.service: Scheduled restart job, restart counter is at 127.
May 05 17:23:29 rocket systemd[1]: Stopped ADC.
May 05 17:23:29 rocket systemd[1]: Started ADC.
● tc.service - Thermocouple
     Loaded: loaded (/etc/systemd/system/tc.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-05-05 17:23:29 PDT; 241ms ago
   Main PID: 7793 (python3)
      Tasks: 1 (limit: 8986)
        CPU: 58ms
     CGroup: /system.slice/tc.service
             └─7793 python3 /home/rocket/binaries/tc.py

May 05 17:23:29 rocket systemd[1]: Started Thermocouple.
rocket@rocket:~ $