containers / qm

QM is a containerized environment for running Functional Safety qm (Quality Management) software
https://github.com/containers/qm
GNU General Public License v2.0
23 stars 26 forks source link

improve agent-flood #629

Open pengshanyu opened 4 weeks ago

pengshanyu commented 4 weeks ago

The following problem sometimes occur:

$ podman exec qm systemctl status bluechi-tester-1

× bluechi-tester-1.service - bluechi-tester-1
     Loaded: loaded (/etc/containers/systemd/bluechi-tester-1.container; generated)
     Active: failed (Result: timeout) since Wed 2024-10-23 07:15:19 UTC; 18s ago
    Process: 151 ExecStart=/usr/bin/podman run --name=systemd-bluechi-tester-1 --cidfile=/run/bluechi-tester-1.cid --replace --rm --cgroups=split --network=host --sdnotify=conmon -d dir:/var/lib/containers/registry/tools-ffi:latest /root/tests/FFI/bin/bluechi-tester --url=tcp:host=10.26.28.155,port=842 --nodename=bluechi-tester-1 --numbersignals=11111111 --signal=JobDone (code=exited, status=0/SUCCESS)
    Process: 176 ExecStopPost=/usr/bin/podman rm -v -f -i --cidfile=/run/bluechi-tester-1.cid (code=exited, status=0/SUCCESS)
   Main PID: 151 (code=exited, status=0/SUCCESS)
        CPU: 2min 4.318s

Oct 23 07:13:24 27c251633eae systemd[1]: Starting bluechi-tester-1...
Oct 23 07:13:24 27c251633eae bluechi-tester-1[151]: Getting image source signatures
Oct 23 07:13:24 27c251633eae bluechi-tester-1[151]: Copying blob sha256:191c327d78ba03afe30fb233a6b18249870076d0ddee6c7801206fb064908857
Oct 23 07:13:24 27c251633eae bluechi-tester-1[151]: Copying blob sha256:d65c66ea6e0ce9c41b9b9119ab07b00bb97f4de33bb0ce95558991ebb961af83
Oct 23 07:14:54 27c251633eae systemd[1]: bluechi-tester-1.service: start operation timed out. Terminating.
Oct 23 07:15:19 27c251633eae systemd[1]: bluechi-tester-1.service: Failed with result 'timeout'.
Oct 23 07:15:19 27c251633eae systemd[1]: Failed to start bluechi-tester-1.
Oct 23 07:15:19 27c251633eae systemd[1]: bluechi-tester-1.service: Consumed 2min 4.318s CPU time.

And also, Bluechi developers suggest that it would probably be best to use some port >1024, e.g. 8420, because 842 is a privileged port. To change the port, it needs to be changed in the ffi-tools image as well: Changing it here for the controller: https://gitlab.com/CentOS/automotive/container-images/ffi-tools/-/blob/main/files/etc/bluechi/controller.conf.d/00-default.conf?ref_type=heads#L17 And adding ControllerPort=8420 here for the agent: https://gitlab.com/CentOS/automotive/container-images/ffi-tools/-/blob/main/files/etc/bluechi/agent.conf.d/00-default.conf?ref_type=heads#L3

Yarboa commented 3 weeks ago

@pengshanyu in case they were running on the same pipeline #630

The disk is full and unrecoverable, so once #630 is fixed, this issue will not occur again

pengshanyu commented 3 weeks ago

Hi @Yarboa, thanks for the investigation. Bluechi developer suggested that we change the port from 842 to 8420. Do you think it is safe to change it now? @dougsland @Yarboa

pengshanyu commented 3 weeks ago

This is the MR to change the bluechi controller port in ffi-tools to 8420