GNS3 / ubridge

Bridge for UDP tunnels, Ethernet, TAP and VMnet interfaces.
GNU General Public License v3.0
115 stars 46 forks source link

"ubridge -e" cause a core dump on fedora workstation 36 #81

Open kefins opened 1 year ago

kefins commented 1 year ago

Hi, guys.

I got a core dump while running "ubridge -e" on fedora workstation 36, here is the deail output.

[root@fedora ubridge]#uname -a
Linux fedora 5.17.5-300.fc36.x86_64 #1 SMP PREEMPT Thu Apr 28 15:51:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

[root@fedora ubridge]#./ubridge -e
Network device list:

Segmentation fault (core dumped)

[root@fedora ubridge]#ldd ./ubridge 
        linux-vdso.so.1 (0x00007ffed33fd000)
        libpcap.so.1 => /lib64/libpcap.so.1 (0x00007f668c9f0000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f668c7ee000)
        libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00007f668c7cc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f668ca54000)
        libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007f668c746000)
        libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007f668c722000)
grossmj commented 1 year ago

Thanks for reporting. This may be a problem with the installed libpcap or the way we call pcap_findalldevs_ex(): https://github.com/GNS3/ubridge/blob/master/src/ubridge.c#L330L354

kefins commented 1 year ago

I executed the program on Fedora Workstation 36, and it did not define macro CYGWIN, so the program would call pcap_findalldevs, in which would cause the core dump. After taking a debug, I found it was caused by the routine nlmsg_inherit in libnl, in which accessed a zero pointer address.

#0  nlmsg_inherit (hdr=hdr@entry=0x7fffffffdc60) at lib/msg.c:329
#1  0x00007ffff7c9c3e8 in nlmsg_alloc_simple (nlmsgtype=nlmsgtype@entry=5121, flags=flags@entry=768) at lib/msg.c:358
#2  0x00007ffff7ca0540 in nl_send_simple (sk=sk@entry=0x41b430, type=type@entry=5121, flags=flags@entry=768, buf=buf@entry=0x0, size=size@entry=0) at lib/nl.c:587
#3  0x00007ffff7d4e79d in rdmanl_get_devices (cb_func=0x7ffff7d4f300 <find_sysfs_devs_nl_cb>, data=0x7fffffffde80, nl=0x41b430) at /usr/src/debug/rdma-core-39.0-1.fc36.x86_64/util/rdma_nl.c:113
#4  find_sysfs_devs_nl (tmp_sysfs_dev_list=0x7fffffffde80) at /usr/src/debug/rdma-core-39.0-1.fc36.x86_64/libibverbs/ibdev_nl.c:200
#5  0x00007ffff7d4c362 in ibverbs_get_device_list (device_list=0x7ffff7d59010 <device_list.lto_priv>) at /usr/src/debug/rdma-core-39.0-1.fc36.x86_64/libibverbs/init.c:560
#6  __ibv_get_device_list_1_1 (num=num@entry=0x7fffffffdee4) at /usr/src/debug/rdma-core-39.0-1.fc36.x86_64/libibverbs/device.c:74
#7  0x00007ffff7f69450 in rdmasniff_findalldevs (devlistp=0x7fffffffdf68, err_str=0x7fffffffdfd0 '#' <repeats 16 times>) at ./pcap-rdmasniff.c:437
#8  0x00007ffff7f69b02 in pcap_findalldevs (alldevsp=<optimized out>, errbuf=<optimized out>) at ./pcap.c:732
#9  0x000000000040460f in display_network_devices () at src/ubridge.c:339
#10 0x000000000040486b in main (argc=2, argv=0x7fffffffe248) at src/ubridge.c:409
(gdb) l
324     struct nl_msg *nlmsg_inherit(struct nlmsghdr *hdr)
325     {
326             struct nl_msg *nm;
327
328             nm = nlmsg_alloc();
329             if (nm && hdr) {
330                     struct nlmsghdr *new = nm->nm_nlh;
331
332                     new->nlmsg_type = hdr->nlmsg_type;
333                     new->nlmsg_flags = hdr->nlmsg_flags;

In line 332, the pointer new is zero. I think it is a bug in libnl, so we should take a change to nlmsg_alloc in lib/nl.c.

kefins commented 1 year ago

Thanks for reporting. This may be a problem with the installed libpcap or the way we call pcap_findalldevs_ex(): https://github.com/GNS3/ubridge/blob/master/src/ubridge.c#L330L354

I figured it out finally, actually it should be a linking problem, the routine in src/netlink/nl.c has the same name with libnl-3, which would cause a linking confusion in libibverbs. Those routines in libibverbs should invoke those routines in libnl-3, but because the linking confusion, they invokd those in src/netlink/nl.c, and result in the core dump. So, the solution I took was changing all routine names in src/netlink/nl.c, for example, add a ubridge_ prefix. And ubridge -e will output all network devices properly.

[parkeryan@fedora gg]$ ./ubridge -e
Network device list:

  ens160 => no description
  any => Pseudo-device that captures on all interfaces
  lo => no description
  bluetooth-monitor => Bluetooth Linux Monitor
  usbmon2 => Raw USB traffic, bus number 2
  usbmon1 => Raw USB traffic, bus number 1
  usbmon0 => Raw USB traffic, all USB buses
  nflog => Linux netfilter log (NFLOG) interface
  nfqueue => Linux netfilter queue (NFQUEUE) interface
grossmj commented 1 year ago

I am a bit confused, where is libibverbs linked to?