Orange-OpenSource / p4rt-ovs

Programming runtime extensions for Open vSwitch with P4
Apache License 2.0
24 stars 9 forks source link

Ovs-vswitchd aborted during mininet shutdown (assertion failed) #8

Closed tatry closed 4 years ago

tatry commented 4 years ago

When I shutdown mininet, ovs-vswitchd gets signal SIGABRT. Mininet also stops on deleting switches, but restart of ovs-vswitchd allows to continue mininet execution.

To trigger this issue, I need to do following steps:

  1. Run my network in mininet To launch my network, I used script demo_net.py in which:

    • changed switch to OVSSwitch;
    • passed option datapath='user' to switch (compare with this Readme).
  2. Load uBPF program into switch I used tunneling.p4 as a uBPF program. I load it into one of the four switches:

    $ sudo ovs-ofctl load-bpf-prog s3 1 tunneling.o
    $ sudo ovs-ofctl show-bpf-prog s3
    NXT_BPF_SHOW_PROG_REPLY (xid=0x4):
    The number of BPF programs already loaded: 1
    id 1:   loaded_at=2020-03-07 T22:14:25 
    map_ids 0,1

    I do not insert data into created maps.

  3. Shutdown mininet Mininet stops at this state:

    
    mininet> exit
    *** Stopping 0 controllers

Stopping 11 links ........... Stopping 4 switches


As the result in the terminal, where ovs-vswitchd has been run, I got following logs:

2020-03-07T21:15:19Z|00096|bridge|INFO|bridge s3: deleted interface s3-eth4 on port 4 ovs-vswitchd(urcu3): ofproto/ofproto.c:1682: assertion hmap_is_empty(&ofproto->ubpf_vms) failed in ofproto_destroy__() Aborted (core dumped)


When I repeat this scenario with gdb, I got following logs for ovs-vswitchd:

2020-03-08T14:11:29Z|00096|bridge|INFO|bridge s3: deleted interface s3-eth4 on port 4 [Thread 0x7ffff0131700 (LWP 5769) exited] [Thread 0x7fffef930700 (LWP 5770) exited] [New Thread 0x7fffef930700 (LWP 5791)] [New Thread 0x7ffff0131700 (LWP 5792)] ovs-vswitchd(urcu2): ofproto/ofproto.c:1682: assertion hmap_is_empty(&ofproto->ubpf_vms) failed in ofproto_destroy__()

Thread 3 "urcu2" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff19fe700 (LWP 5647)] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt

0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

1 0x00007ffff5502801 in __GI_abort () at abort.c:79

2 0x0000555555a12d53 in ovs_abort_valist (err_no=err_no@entry=0,

format=format@entry=0x555555c16ba0 "%s: assertion %s failed in %s()", 
args=args@entry=0x7ffff19fd850) at lib/util.c:419

3 0x0000555555a25489 in vlog_abort_valist (

module_=module_@entry=0x555555f956c0 <this_module>, 
message=message@entry=0x555555c16ba0 "%s: assertion %s failed in %s()", 
args=args@entry=0x7ffff19fd850) at lib/vlog.c:1227

4 0x0000555555a2558c in vlog_abort (

module=module@entry=0x555555f956c0 <this_module>, 
message=message@entry=0x555555c16ba0 "%s: assertion %s failed in %s()")
at lib/vlog.c:1241

5 0x0000555555a1291b in ovs_assert_failure (

where=where@entry=0x555555b7bfc0 "ofproto/ofproto.c:1682", 
function=function@entry=0x555555b7fde0 <__func__.24577> "ofproto_destroy__", condition=condition@entry=0x555555b7bf60 "hmap_is_empty(&ofproto->ubpf_vms)")
at lib/util.c:86

6 0x00005555556bc1cf in ofproto_destroy__ (ofproto=0x61b00000cba0)

at ofproto/ofproto.c:1682

7 0x0000555555979a99 in ovsrcu_call_postponed () at lib/ovs-rcu.c:342

8 0x0000555555979c94 in ovsrcu_postpone_thread (arg=)

at lib/ovs-rcu.c:357

9 0x000055555597feea in ovsthreadwrapper (aux=)

at lib/ovs-thread.c:383

10 0x00007ffff5e606db in start_thread (arg=0x7ffff19fe700)

at pthread_create.c:463

11 0x00007ffff55e388f in clone ()

at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


It seems to be connected with the issue #3 
osinstom commented 4 years ago

I've just created PR for this issue. @tatry, Could you check if this problem does not appear anymore?

tatry commented 4 years ago

I've tried this PR and got following error in ovs-vswitchd:

2020-03-09T20:13:08Z|00180|bridge|INFO|bridge s3: deleted interface s3-eth4 on port 4
=================================================================
==12502==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b000014420 at pc 0x55568da7ba90 bp 0x7f68d69fd990 sp 0x7f68d69fd980
READ of size 8 at 0x60b000014420 thread T2 (urcu3)
2020-03-09T20:13:08Z|00181|bridge|INFO|bridge s2: deleted interface s2-eth4 on port 4
    #0 0x55568da7ba8f in hmap_next include/openvswitch/hmap.h:398
    #1 0x55568da7ba8f in ofproto_destroy__ ofproto/ofproto.c:1682
    #2 0x55568dd29cc8 in ovsrcu_call_postponed lib/ovs-rcu.c:342
    #3 0x55568dd29ec3 in ovsrcu_postpone_thread lib/ovs-rcu.c:357
    #4 0x55568dd30119 in ovsthread_wrapper lib/ovs-thread.c:383
    #5 0x7f68dae526da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    #6 0x7f68da5d588e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e)

0x60b000014420 is located 16 bytes inside of 104-byte region [0x60b000014410,0x60b000014478)
freed by thread T2 (urcu3) here:
    #0 0x7f68dbeea7b8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7b8)
    #1 0x55568da7b8b2 in ofproto_destroy__ ofproto/ofproto.c:1684
    #2 0x55568dd29cc8 in ovsrcu_call_postponed lib/ovs-rcu.c:342
    #3 0x55568dd29ec3 in ovsrcu_postpone_thread lib/ovs-rcu.c:357
    #4 0x55568dd30119 in ovsthread_wrapper lib/ovs-thread.c:383
    #5 0x7f68dae526da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)

previously allocated by thread T0 here:
    #0 0x7f68dbeead38 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded38)
    #1 0x55568ddc30d2 in xcalloc lib/util.c:121
    #2 0x55568ddda4b8 in ubpf_create lib/bpf/ubpf_vm.c:135
    #3 0x55568ddddfb0 in create_ubpf_vm lib/bpf.c:35
    #4 0x55568da8f8dc in handle_bpf_load_prog ofproto/ofproto.c:6304
    #5 0x55568da8f8dc in handle_single_part_openflow ofproto/ofproto.c:8885
    #6 0x55568da8f8dc in handle_openflow ofproto/ofproto.c:9082
    #7 0x55568db38162 in ofconn_run ofproto/connmgr.c:1321
    #8 0x55568db38162 in connmgr_run ofproto/connmgr.c:356
    #9 0x55568da74156 in ofproto_run ofproto/ofproto.c:1899
    #10 0x55568da3ae15 in bridge_run__ vswitchd/bridge.c:3246
    #11 0x55568da4de54 in bridge_run vswitchd/bridge.c:3305
    #12 0x55568da2d933 in main vswitchd/ovs-vswitchd.c:127
    #13 0x7f68da4d5b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)

Thread T2 (urcu3) created by T1 (ct_clean1) here:
    #0 0x7f68dbe43d2f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x37d2f)
    #1 0x55568dd303de in ovs_thread_create lib/ovs-thread.c:447
    #2 0x55568dd29e81 in ovsrcu_quiesced lib/ovs-rcu.c:120
    #3 0x55568ddad54f in time_poll lib/timeval.c:321
    #4 0x55568dd6df8a in poll_block lib/poll-loop.c:364
    #5 0x55568ded57fe in clean_thread_main lib/conntrack.c:1563
    #6 0x55568dd30119 in ovsthread_wrapper lib/ovs-thread.c:383
    #7 0x7f68dae526da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)

Thread T1 (ct_clean1) created by T0 here:
    #0 0x7f68dbe43d2f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x37d2f)
    #1 0x55568dd303de in ovs_thread_create lib/ovs-thread.c:447
    #2 0x55568ded93a8 in conntrack_init lib/conntrack.c:322
    #3 0x55568db91b9e in create_dp_netdev lib/dpif-netdev.c:1568
    #4 0x55568db91b9e in dpif_netdev_open lib/dpif-netdev.c:1624
    #5 0x55568dbad0a2 in do_open lib/dpif.c:347
    #6 0x55568dbad560 in dpif_create lib/dpif.c:401
    #7 0x55568dbad560 in dpif_create_and_open lib/dpif.c:414
    #8 0x55568dabd923 in open_dpif_backer ofproto/ofproto-dpif.c:774
    #9 0x55568dabd923 in construct ofproto/ofproto-dpif.c:1605
    #10 0x55568da7c4d8 in ofproto_create ofproto/ofproto.c:558
    #11 0x55568da46a1c in bridge_reconfigure vswitchd/bridge.c:878
    #12 0x55568da4df6d in bridge_run vswitchd/bridge.c:3326
    #13 0x55568da2d933 in main vswitchd/ovs-vswitchd.c:127
    #14 0x7f68da4d5b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)

SUMMARY: AddressSanitizer: heap-use-after-free include/openvswitch/hmap.h:398 in hmap_next
Shadow bytes around the buggy address:
  0x0c167fffa830: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  0x0c167fffa840: fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa
  0x0c167fffa850: fa fa fa fa fa fa fd fd fd fd fd fd fd fd fd fd
  0x0c167fffa860: fd fd fd fd fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c167fffa870: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa
=>0x0c167fffa880: fa fa fd fd[fd]fd fd fd fd fd fd fd fd fd fd fa
  0x0c167fffa890: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c167fffa8a0: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fd fd
  0x0c167fffa8b0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c167fffa8c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c167fffa8d0: fa fa fa fa fa fa fa fa fa fa fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==12502==ABORTING

Ovs-vswitchd have been terminated.

osinstom commented 4 years ago

I've tried this PR and got following error in ovs-vswitchd:

Ovs-vswitchd have been terminated.

@tatry Could you list the steps to reproduce this error? I've tried it locally and ovs-vswitchd is stopped without any error.

tatry commented 4 years ago

After merge PR #10 into master I cannot reproduce this error anymore. Thank you.

PS. I'm not sure if I used git correctly to check the changes.

EDIT: I forgot that I had passes additional option to ./configure: CFLAGS="-g -O2 -fsanitize=address -fno-omit-frame-pointer -fno-common", as said this page for debugging purposes. When I rebuild with these options, I again get this error. So, its apply for reopen?

osinstom commented 4 years ago

@tatry I've prepared the new fix. Could you re-check if problem occurs using issue-8 branch ?

tatry commented 4 years ago

I checked your fix, still get the same error.

I'd done some some little investigate in the problem. In the file hmap.h I'd found macro HMAP_FOR_EACH_SAFE and used it instead of HMAP_FOR_EACH in your fix. Then I didn't get that error. Below my changes:

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index e7c93d26e..3e1bfa8e6 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -1679,7 +1679,8 @@ ofproto_destroy__(struct ofproto *ofproto)
     }
     free(ofproto->tables);
     struct ubpf_vm *vm;
-    HMAP_FOR_EACH(vm, hmap_node, &ofproto->ubpf_vms) {
+    struct ubpf_vm *vm_next;
+    HMAP_FOR_EACH_SAFE(vm, vm_next, hmap_node, &ofproto->ubpf_vms) {
         ubpf_destroy(vm);
     }
     hmap_clear(&ofproto->ubpf_vms);
osinstom commented 4 years ago

Great! Could you create Pull Request with your changes ?