cadets / freebsd-old

FreeBSD src tree http://www.FreeBSD.org/
Other
12 stars 7 forks source link

dtrace deadlock when running under some stress #147

Closed lc525 closed 2 years ago

lc525 commented 2 years ago

On commit 01735d95960

In doing some simple nfs tests using fio inside a VM while tracing on the host, I got dtrace to deadlock.

The minimal script exhibiting this behaviour: dtrace -E -n '*:syscall:::entry/(execname == "fio" ) / { @[vmname, probefunc] = count ( ) ; }'

Dtrace deadlocks and Ctrl+c no longer stops the script. The result of Ctrl+t on the hanged dtrace:

ctrl-t: load: 0.06  cmd: dtrace 5146 [umtxn] 907.93r 0.97u 0.23s 0% 146976k
mi_switch+0x155 sleepq_switch+0x119 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _sleep+0x325 umtxq_sleep+0x1c3 do_lock_umutex+0x7c8 __umtx_op_wait_umutex+0x49 sys__umtx_op+0x7a amd64_syscall+0x5c0 fast_syscall_common+0xf8
dstolfa commented 2 years ago

I believe I managed to hit this issue while trying to reproduce #146, however I was able to kill -9 it without issue:

  ioctl                                             
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |                                         1        
            1024 |@@                                       42       
            2048 |@@@@@@@@@@                               234      
            4096 |@@@@@@@@@@@@@@@@                         367      
            8192 |@@@@                                     98       
           16384 |@                                        14       
           32768 |                                         5        
           65536 |                                         5        
          131072 |                                         1        
          262144 |                                         2        
          524288 |                                         0        
         1048576 |                                         0        
         2097152 |                                         0        
         4194304 |                                         0        
         8388608 |                                         0        
        16777216 |                                         0        
        33554432 |                                         0        
        67108864 |@@@                                      62       
       134217728 |@                                        31       
       268435456 |@@                                       49       
       536870912 |@                                        20       
      1073741824 |                                         1        
      2147483648 |                                         1        
      4294967296 |                                         0        

^C^C^C^C^C^C^C^C
load: 0.08  cmd: dtrace 4039 [umtxn] 18.63r 0.26u 0.46s 0% 89860k
mi_switch+0x155 sleepq_switch+0x119 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _sleep+0x325 umtxq_sleep+0x1c3 do_lock_umutex+0x7c8 __umtx_op_wait_umutex+0x49 sys__umtx_op+0x7a amd64_syscall+0x5c0 fast_syscall_common+0xf8 
Killed

I believe that this is related to the cleanup code when SIGTERM gets processed by dtrace(1). Will look into it.

dstolfa commented 2 years ago

@lc525 Could you please test that this is resolved in the last commit? I can't trigger the problem anymore.

lc525 commented 2 years ago

Haven't encountered this since the fix.