a2o / snoopy

Snoopy Command Logger is a small library that logs all program executions on your Linux/BSD system.
GNU General Public License v2.0
1.21k stars 154 forks source link

snoopy causing NetworkManager to segfault #106

Closed tkimball83 closed 8 years ago

tkimball83 commented 8 years ago

Architecture: x86_64 Linux distribution: CentOS Linux kernel: 4.5.3-x86_64-linode67 (www.linode.com) Distribution version: 7.2.1511 Snoopy version: 2.4.5 Snoopy config file was used: yes Snoopy threading support enabled: Configured with --enable-everything

This issue may be specific to Linode as I just fired up a node this evening, but I thought I'd report it anyhow.

[root@fremont ~]# service NetworkManager restart
Redirecting to /bin/systemctl restart NetworkManager.service
NetworkManager[3515]: segfault at 556f7d40ecdf ip 00007f0e9e9332f2 sp 00007fff6cdab160 error 4 in libsnoopy.so.0.0.0[7f0e9e928000+f000]
NetworkManager[3517]: segfault at 556f7d48235f ip 00007f0e9e9332f2 sp 00007fff6cdaae30 error 4 in libsnoopy.so.0.0.0[7f0e9e928000+f000]

You can download my custom el7 package here:

https://packagecloud.io/tkimball83/snoopy

My packages build instructions are here:

https://github.com/linuxhq/rpmbuild-snoopy/blob/master/SPECS/snoopy.spec

bostjan commented 8 years ago

Do you get core dump file, can you produce a function call stack for this? It would help tremendously.

tkimball83 commented 8 years ago

I'm having trouble producing a core dump file. While the core dumping ability is enabled on this node it seems NetworkManager won't produce a core file. I tested with a bunk binary I created and cores are being generated successfully, but not with NetworkManager :(

bostjan commented 8 years ago

Does it fail also:

A bit short on time atm, sorry.

bostjan commented 8 years ago

Works for me (minimal install + install group "GNOME Desktop" + graphical.target set + static network config.

Do you have anything else configured that is drastically different? Can you retest with stock centos 7 + snoopy + similar network config?

tkimball83 commented 8 years ago

Are you using NetworkManager with dhcp? I wasn't using static address settings.

p64 commented 8 years ago

I have the same issue with DHCP. Running as a guest in KVM. kernel - 3.10.0-327.22.2.el7.x86_64.

@tkimball83 - I noticed your spec file actually says "%configure --enable-everything". They says that is only for testing purposes. However I built without that and got the same result. I'm posting my spec file shortly.

tkimball83 commented 8 years ago

I'm happy to see that its most likely not kernel related due to the big gap in versions.

@p64 - Were you able to produce a core dump?

bostjan commented 8 years ago

After initial investigation, this has something to do with configuration file, more specifically with imported iniparser library.

Adding --disable-config-file to the end of ./configure line solves the problem. I know this is not a solution, just a workaround.

bostjan commented 8 years ago

Also, the problem is not NM itself, but rather dhclient, that gets executed as its subroutine, dies with segmentation fault (11).

No clue why, as of this moment.

p64 commented 8 years ago

@tkimball83 - I didn't get that far. Can try but it sounds like @bostjan is past that point at this point. Can attempt to capture if need be still.

bostjan commented 8 years ago

@p64 and @tkimball83 - if any of you are able to obtain a core dump of crashed process, it would be tremendeously helpful.

p64 commented 8 years ago

@bostjan - Sometimes I miss the good ol' days when core dumps just actually showed up. Still trying to produce one with a variety of failed methods. Even tried running dhclient in the foreground which was of little value. Still working on it.

p64 commented 8 years ago

@bostjan - it seems to be network manager that is coredumping.

I actually spent a few cycles trying to run dhclient manually, which was a disaster since it gave me an IP. :)

[root@centos7 ~]# gdb /usr/sbin/NetworkManager core.2127
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/NetworkManager...Reading symbols from /usr/lib/debug/usr/sbin/NetworkManager.debug...done.
done.
[New LWP 2127]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/NetworkManager --no-daemon -d'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fabb83d1fa2 in iniparser_load () from /usr/lib64/libsnoopy.so
Missing separate debuginfos, use: debuginfo-install ModemManager-glib-1.1.0-8.git20130913.el7.x86_64 bluez-libs-5.23-4.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 dbus-glib-0.100-7.el7.x86_64 dbus-libs-1.6.12-13.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 elfutils-libs-0.163-3.el7.x86_64 glib2-2.42.2-5.el7.x86_64 glibc-2.17-106.el7_2.6.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libffi-3.0.13-16.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libgcrypt-1.5.3-12.el7_1.1.x86_64 libgpg-error-1.12-3.el7.x86_64 libgudev1-219-19.el7_2.11.x86_64 libndp-1.2-6.el7_2.x86_64 libnl3-3.2.21-10.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libsoup-2.48.1-3.el7.x86_64 libuuid-2.23.2-26.el7_2.2.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 nspr-4.11.0-1.el7_2.x86_64 nss-3.21.0-9.el7_2.x86_64 nss-util-3.21.0-2.2.el7_2.x86_64 pcre-8.32-15.el7_2.1.x86_64 snoopy-2.4.5-1.el7.centos.x86_64 sqlite-3.7.17-8.el7.x86_64 systemd-libs-219-19.el7_2.11.x86_64 teamd-1.17-6.el7_2.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) where
#0  0x00007fabb83d1fa2 in iniparser_load () from /usr/lib64/libsnoopy.so
#1  0x00007fabb83ce9af in snoopy_configfile_load ()
   from /usr/lib64/libsnoopy.so
#2  0x00007fabb83cde5b in snoopy_init () from /usr/lib64/libsnoopy.so
#3  0x00007fabb83cdada in snoopy_log_syscall_exec ()
   from /usr/lib64/libsnoopy.so
#4  0x00007fabb83cdba1 in snoopy_log_syscall_execv ()
   from /usr/lib64/libsnoopy.so
#5  0x00007fabb83d2761 in execv () from /usr/lib64/libsnoopy.so
#6  0x00007fabb5077cf7 in do_exec () from /lib64/libglib-2.0.so.0
#7  0x00007fabb50786fe in fork_exec_with_pipes () from /lib64/libglib-2.0.so.0
#8  0x00007fabb5079236 in g_spawn_async_with_pipes ()
   from /lib64/libglib-2.0.so.0
#9  0x00007fabb507933c in g_spawn_async () from /lib64/libglib-2.0.so.0
#10 0x00007fabb884c50a in dhclient_start (client=client@entry=0x7fabb9ee0a80,
    mode_opt=mode_opt@entry=0x0, duid=duid@entry=0x0, release=release@entry=0,
    out_pid=out_pid@entry=0x0) at dhcp-manager/nm-dhcp-dhclient.c:443
#11 0x00007fabb884cd84 in ip4_start (client=0x7fabb9ee0a80,
    dhcp_anycast_addr=0x0, last_ip4_address=<optimized out>)
    at dhcp-manager/nm-dhcp-dhclient.c:483
#12 0x00007fabb8872b1a in client_start (self=self@entry=0x7fabb9e9df60,
    iface=iface@entry=0x7fabb9f04120 "eth0", ifindex=ifindex@entry=2,
    hwaddr=hwaddr@entry=0x7fabb9f14330,
---Type <return> to continue, or q <return> to quit---
    uuid=uuid@entry=0x7fabb9eddfd0 "f2582e6b-2a9b-4f2a-8e6d-7a4e452518b0",
    priority=priority@entry=100, ipv6=ipv6@entry=0,
    dhcp_client_id=dhcp_client_id@entry=0x0, timeout=timeout@entry=0,
    dhcp_anycast_addr=dhcp_anycast_addr@entry=0x0,
    hostname=hostname@entry=0x7fabb9ef13d0 "centos7.pier64.com",
    info_only=info_only@entry=0,
    privacy=privacy@entry=NM_SETTING_IP6_CONFIG_PRIVACY_DISABLED,
    last_ip4_address=last_ip4_address@entry=0x0)
    at dhcp-manager/nm-dhcp-manager.c:270
#13 0x00007fabb8872d80 in nm_dhcp_manager_start_ip4 (self=0x7fabb9e9df60,
    iface=iface@entry=0x7fabb9f04120 "eth0", ifindex=ifindex@entry=2,
    hwaddr=hwaddr@entry=0x7fabb9f14330,
    uuid=uuid@entry=0x7fabb9eddfd0 "f2582e6b-2a9b-4f2a-8e6d-7a4e452518b0",
    priority=priority@entry=100, send_hostname=send_hostname@entry=1,
    dhcp_hostname=0x7fabb9ef13d0 "centos7.pier64.com",
    dhcp_hostname@entry=0x0, dhcp_client_id=dhcp_client_id@entry=0x0,
    timeout=timeout@entry=0, dhcp_anycast_addr=dhcp_anycast_addr@entry=0x0,
    last_ip_address=last_ip_address@entry=0x0)
    at dhcp-manager/nm-dhcp-manager.c:310
#14 0x00007fabb885d839 in dhcp4_start (self=self@entry=0x7fabb9f04b70,
    connection=connection@entry=0x7fabb9ede190,
    reason=reason@entry=0x7ffeee242eec) at devices/nm-device.c:3622
#15 0x00007fabb885db3e in act_stage3_ip4_config_start (self=0x7fabb9f04b70,
---Type <return> to continue, or q <return> to quit---
    out_config=0x7ffeee242ef0, reason=0x7ffeee242eec)
    at devices/nm-device.c:3894
#16 0x00007fabb8842f58 in act_stage3_ip4_config_start (device=0x7fabb9f04b70,
    out_config=0x7ffeee242ef0, reason=0x7ffeee242eec)
    at devices/nm-device-ethernet.c:1321
#17 0x00007fabb8866af0 in nm_device_activate_stage3_ip4_start (
    self=self@entry=0x7fabb9f04b70) at devices/nm-device.c:5266
#18 0x00007fabb8867388 in nm_device_activate_stage3_ip_config_start (
    user_data=<optimized out>) at devices/nm-device.c:5415
#19 0x00007fabb50337aa in g_main_context_dispatch ()
   from /lib64/libglib-2.0.so.0
#20 0x00007fabb5033af8 in g_main_context_iterate.isra.24 ()
   from /lib64/libglib-2.0.so.0
#21 0x00007fabb5033dca in g_main_loop_run () from /lib64/libglib-2.0.so.0
#22 0x00007fabb883d123 in main (argc=1, argv=0x7ffeee243248) at main.c:512
(gdb)

Does that help?

core.2127.gz

bostjan commented 8 years ago

@p64 Yeah I got that far in the mean time too.

Dissecting iniparser is really not on my high-priority todo list, and thus just replacing it with alternative implementation is more viable approach.

Bugfix is available in branch bugfix/github-106-nm-segfault. Works for me, no NM segfault, IP obtained via DHCP.

Please be kind enough and test out the bugfix and report back results. Thank you.

jmtysonjr commented 8 years ago

@bostjan your fix worked for me, thanks! RHEL7.2

bostjan commented 8 years ago

Fix released as version 2.4.6. Thanks for verification.