Open interduo opened 2 months ago
This is also lqos_scheduler problem:
What I did:
journalctl --vacuum-time=15min --rotate
reboot
journalctl -u lqos_scheduler
-- Boot b0a56b4b200144f3802715e47c588a83 --
Jul 12 09:24:31 libreqos-beta systemd[1]: Starting lqos_scheduler.service...
Jul 12 09:24:31 libreqos-beta python3[943]: thread '<unnamed>' panicked at lqos_python/src/lib.rs:269:70:
Jul 12 09:24:31 libreqos-beta python3[943]: called `Result::unwrap()` on an `Err` value: Socket (typically /run/lqos/bus) not found. Check that lqosd is running, and you have permi>
Jul 12 09:24:31 libreqos-beta python3[943]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Jul 12 09:24:31 libreqos-beta python3[943]: Running Python Version 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0]
Jul 12 09:24:31 libreqos-beta python3[943]: refreshShapers starting at 12/07/2024 09:24:31
Jul 12 09:24:31 libreqos-beta python3[943]: First time run since system boot.
Jul 12 09:24:31 libreqos-beta python3[943]: Validating input files 'ShapedDevices.csv' and 'network.json'
Jul 12 09:24:33 libreqos-beta python3[943]: Traceback (most recent call last):
Jul 12 09:24:33 libreqos-beta python3[943]: File "/opt/libreqos/src/scheduler.py", line 69, in <module>
Jul 12 09:24:33 libreqos-beta python3[943]: importAndShapeFullReload()
Jul 12 09:24:33 libreqos-beta python3[943]: File "/opt/libreqos/src/scheduler.py", line 62, in importAndShapeFullReload
Jul 12 09:24:33 libreqos-beta python3[943]: refreshShapers()
Jul 12 09:24:33 libreqos-beta python3[943]: File "/opt/libreqos/src/LibreQoS.py", line 448, in refreshShapers
Jul 12 09:24:33 libreqos-beta python3[943]: if (validateNetworkAndDevices() == True):
Jul 12 09:24:33 libreqos-beta python3[943]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jul 12 09:24:33 libreqos-beta python3[943]: File "/opt/libreqos/src/LibreQoS.py", line 130, in validateNetworkAndDevices
Jul 12 09:24:33 libreqos-beta python3[943]: rustValid = validate_shaped_devices()
Jul 12 09:24:33 libreqos-beta python3[943]: ^^^^^^^^^^^^^^^^^^^^^^^^^
Jul 12 09:24:33 libreqos-beta python3[943]: pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Socket (typically /run/lqos/bus) not found. Check that lqosd i>
Jul 12 09:24:33 libreqos-beta systemd[1]: lqos_scheduler.service: Main process exited, code=exited, status=1/FAILURE
Jul 12 09:24:41 libreqos-beta systemd[1]: lqos_scheduler.service: Failed with result 'exit-code'.
Jul 12 09:24:41 libreqos-beta systemd[1]: Failed to start lqos_scheduler.service.
Jul 12 09:24:41 libreqos-beta systemd[1]: lqos_scheduler.service: Consumed 2.281s CPU time.
Jul 12 09:24:41 libreqos-beta systemd[1]: lqos_scheduler.service: Scheduled restart job, restart counter is at 1.
Jul 12 09:24:41 libreqos-beta systemd[1]: Starting lqos_scheduler.service...
Setting ExecStartPre=/bin/sleep 60
in lqos_scheduler.service
helps for that
Temporary solution: https://github.com/LibreQoE/LibreQoS/pull/522/
Don't requires implementing anything in lqosd.
The good news is that with UI2, there's no more rocket or separate node_manager
daemon - so the Rocket side of things is going away. The scheduler needs to do an "is lqosd running? If not, delay" check - that should be easy enough.
Well this should be done on systemd level I think. It was creates for also this.
Ok the situation now is that: scheduler not started because no lqosd lqosd not started because qsfp+ not up (sometimes it is negotiating connection few secs) scheduler give up and throw error in dmesg that it could not be started. I started manually started lqosd then scheduler.
Lqos_scheduler schould check link state before checking lqosd (?) if interfaces are not up just sleep some time and check again.
Aftert reboot there is always problem:
After
systemctl restart lqos_node_manger
all is starting perfectly.Solution 1 "temporary": add ExecStartPre=/bin/sleep 10 in systemd service unit file
cat /etc/systemd/system/lqos_node_manager.service
Solution 2: propper way fix, use service notify type
What do You think about it?