Closed kirk444 closed 8 months ago
thanks for the strace, that's very helpful. Seems like the issue is related to the latest change in naemon to use uuids as problem id. I'll have a look
After updating just the packages on Ubuntu 22.04 eg. apt-get upgrade;apt-get autoremove, and then a reboot, OMD doesn't come up again... not sure how to fix it?
root@:~# omd start bgde /omd/sites/bgde/etc/rc.d/20-influxdb: line 15: lib/omd/init_profile: No such file or directory /omd/sites/bgde/etc/rc.d/20-influxdb: line 102: __generic_init: command not found /omd/sites/bgde/etc/rc.d/80-naemon: line 14: lib/omd/init_profile: No such file or directory /omd/sites/bgde/etc/rc.d/80-naemon: line 38: merge-core-config: command not found naemon configuration file /omd/sites/bgde/tmp/naemon/naemon.cfg not found. Terminating... /omd/sites/bgde/etc/rc.d/85-apache: line 16: lib/omd/init_profile: No such file or directory /omd/sites/bgde/etc/rc.d/85-apache: line 72: generic_init: command not found /omd/sites/bgde/etc/rc.d/85-nagflux: line 16: lib/omd/init_profile: No such file or directory /omd/sites/bgde/etc/rc.d/85-nagflux: line 32: generic_init: command not found /omd/sites/bgde/etc/rc.d/90-grafana: line 15: lib/omd/init_profile: No such file or directory /omd/sites/bgde/etc/rc.d/90-grafana: line 38: generic_init: command not found /omd/sites/bgde/etc/rc.d/90-xinetd: line 14: lib/omd/init_profile: No such file or directory /omd/sites/bgde/etc/rc.d/90-xinetd: line 41: generic_init: command not found /omd/sites/bgde/etc/rc.d/99-crontab: line 12: lib/omd/init_profile: No such file or directory /omd/sites/bgde/etc/rc.d/99-crontab: line 20: __init_hook: command not found Starting crontab.../omd/sites/bgde/etc/rc.d/99-crontab: line 24: /omd/sites/bgde/bin/merge-crontabs: No such file or directory OK
Also found this: root@it57-debec1:/omd/versions# ls -la total 16 drwxr-xr-x 4 root root 4096 Mar 15 15:37 . drwxr-xr-x 5 root root 4096 Nov 17 2022 .. drwxr-xr-x 3 root root 4096 Mar 8 2023 5.00-labs-edition drwxr-xr-x 7 root root 4096 Mar 15 15:36 5.30-labs-edition lrwxrwxrwx 1 root root 21 Nov 17 2022 default -> /etc/alternatives/omd
This does not look right to me? On other installations default points to one of the folders above?
Hi again, found this: root@it57-debec1:~# apt-get autoremove Reading package lists... Done Building dependency tree... Done Reading state information... Done The following packages will be REMOVED: omd-5.20-labs-edition 0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded. After this operation, 2419 MB disk space will be freed. Do you want to continue? [Y/n] (Reading database ... 221330 files and directories currently installed.) Removing omd-5.20-labs-edition (1.ubuntu22.04) ... Site bgde is still using this version, saving skel/ folder for later upgrade
Looks like the autoremove just removes the current version used... which breaks it... Not sure if you can somehow hold back the autoremove until the OMD is updated?
I fixed it with "omd update bgde" which thankfully updated everything, and I was able to restart everything again...
That's exactly how it's supposed to work. If you remove a OMD version which is in use, omd will put the site in a state where you can run "omd update" into any new version you install. That's the common way for os updates ex. on debian or ubuntu. Then after the dist-upgrade, you install a new OMD version, su into the site and run omd update once.
@kirk444 can you run ldd bin/naemon
. It looks like bin/naemon uses a wrong libnaemon for some reasons.
It should look like this:
OMD[test@ubuntu22-04-64]:~$ ldd bin/naemon
linux-vdso.so.1 (0x00007ffc6f3cc000)
libnaemon.so.0 => /omd/sites/test/lib/libnaemon.so.0 (0x00007f51c2180000)
libglib-2.0.so.0 => /lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007f51c2039000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f51c1e10000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f51c1d9a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f51c1cb3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f51c2268000)
That's exactly how it's supposed to work. If you remove a OMD version which is in use, omd will put the site in a state where you can run "omd update" into any new version you install. That's the common way for os updates ex. on debian or ubuntu. Then after the dist-upgrade, you install a new OMD version, su into the site and run omd update once.
I guess you are right. I just had autoremove enabled in my autoupdate scripts, so it just stoped working after an automatic update which isn't ideal :-)
@kirk444 can you run
ldd bin/naemon
. It looks like bin/naemon uses a wrong libnaemon for some reasons.
Mine looks a bit different, here's what it looks like on the new, and old, versions. That path is a symlink "version" in the sites root directory.
OMD[sxomd@omdhost]:~$ ldd bin/naemon
linux-vdso.so.1 => (0x00007f106b57a000)
libnaemon.so.0 => /omd/versions/5.30-labs-edition/lib/libnaemon.so.0 (0x00007f106b2ab000)
libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007f106af8b000)
libm.so.6 => /lib64/libm.so.6 (0x00007f106ac89000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f106aa85000)
libc.so.6 => /lib64/libc.so.6 (0x00007f106a6b6000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f106a454000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f106a238000)
/lib64/ld-linux-x86-64.so.2 (0x0000564445e50000)
OMD[sxomd@omdhost]:~$ ldd bin/naemon
linux-vdso.so.1 => (0x00007fffa964f000)
libnaemon.so.0 => /omd/versions/5.20-labs-edition/lib/libnaemon.so.0 (0x00007f32738d4000)
libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007f32735b4000)
libm.so.6 => /lib64/libm.so.6 (0x00007f32732b2000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f32730ae000)
libc.so.6 => /lib64/libc.so.6 (0x00007f3272cdf000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f3272a7d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f3272861000)
/lib64/ld-linux-x86-64.so.2 (0x0000565125c2e000)
does this look similar on your system?
OMD[test@centos7-64]:~$ md5sum lib/libnaemon.so.0
fec9bb8efdff516aabf7fca99e137c6d lib/libnaemon.so.0
OMD[test@centos7-64]:~$ strings lib/libnaemon.so.0 | grep g_uuid_string
g_uuid_string_random
OMD[test@centos7-64]:~$ omd version
OMD - Open Monitoring Distribution Version 5.30-labs-edition, Python version 3.6.8
Yes, the md5sum matches exactly, the same string is present, and he omd/python versions are the same.
then i am out of ideas tbh, that version runs fine for hours here.
I'm a bit out of my element here, but it seems like this is the function: https://docs.gtk.org/glib/func.uuid_string_random.html - and according to that documentation it was added in glib 2.52.
My system is currently running glib2 2.50. I updated glib2 (2.50 --> 2.56) and will wait and see if that resolves the issue.
It certainly seems like that was the issue, I will close this as resolved (by updating glib2).
After updating site from omd 5.20 to omd 5.20 (omd stop; omd update; omd start - no issues encountered during the change) naemon does not stay running, it continually dies (after about 30-60sec). If I watch the naemon process with strace, I see the following:
writev(2, [{"/omd/sites/sxomd/bin/naemon", 27}, {": ", 2}, {"symbol lookup error", 19}, {": ", 2}, {"/omd/versions/5.30-labs-edition/"..., 50}, {": ", 2}, {"undefined symbol: g_uuidstring"..., 38}, {"", 0}, {"", 0}, {"\n", 1}], 10) = 141
Host system is el7 (CentOS 7)
update.log: 2024-02-07 13:57:14 - Updating site 'sxomd' from version 5.20-labs-edition to 5.30-labs-edition...
Executing pre-update script "omd"...OK