columbia / racepro

Racepro
0 stars 2 forks source link

upstart: waitpid() causes resource deadlock #10

Closed chiache closed 13 years ago

chiache commented 13 years ago

This is the "modified replay" of upstart. The logs are located at hp05:/persisit/need-diagnose/upstart-expectfork

Original log: hp05:/persist/need-diagnose/upstart-expectfork/out.log Modified log: hp05:/persisit/need-diagnose/upstart-expectfork/out.1.log (all 4 logs reproduce the same problem)

backtrace:

INFO:root: running pre-replay callback... INFO:root: replaying ... INFO:root:timeout: 45, kill: True INFO:root:context.wait() [3] INFO:root: Backtrace (50 events): [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000008, ecx: b93f91b8, edx: 00000800 esi: b93f91b8, edi: 00000008, ebp: bfcc20d8, esp: bfcc2088 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 532, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 191, serial = 16, desc = socket:[138024] [01] resource unlock, id = 191 [01] syscall ended [01] close() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000006, ebx: 00000008, ecx: b77283c0, edx: b7784ff4 esi: 00000008, edi: 00000000, ebp: bfcc2078, esp: bfcc204c ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = write, id = 3, serial = 561, desc = none [01] resource unlock, id = 3 [01] syscall ended [01] write() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000004, ebx: 00000004, ecx: b77a1228, edx: 00000001 esi: b77a1228, edi: 00000000, ebp: bfcc2148, esp: bfcc2118 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 156, serial = 8, desc = pipe:[138019] [01] resource unlock, id = 156 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 24, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 25, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 26, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = -11 EAGAIN (Resource temporarily unavailable) [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 27, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] waitid() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 0000011c, ebx: 00000000, ecx: 00000000, edx: bfcc2208 esi: 0000000f, edi: 00000000, ebp: bfcc21b8, esp: bfcc2188 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] data: internal, ptr = (nil), size = 4, 00000006 [01] resource lock, type = pid, access = write, id = 175, serial = 2, desc = none Last events of each known process: [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000008, ecx: b93f91b8, edx: 00000800 esi: b93f91b8, edi: 00000008, ebp: bfcc20d8, esp: bfcc2088 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 532, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 191, serial = 16, desc = socket:[138024] [01] resource unlock, id = 191 [01] syscall ended [01] close() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000006, ebx: 00000008, ecx: b77283c0, edx: b7784ff4 esi: 00000008, edi: 00000000, ebp: bfcc2078, esp: bfcc204c ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = write, id = 3, serial = 561, desc = none [01] resource unlock, id = 3 [01] syscall ended [01] write() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000004, ebx: 00000004, ecx: b77a1228, edx: 00000001 esi: b77a1228, edi: 00000000, ebp: bfcc2148, esp: bfcc2118 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 156, serial = 8, desc = pipe:[138019] [01] resource unlock, id = 156 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 24, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 25, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 26, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = -11 EAGAIN (Resource temporarily unavailable) [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 27, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] waitid() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 0000011c, ebx: 00000000, ecx: 00000000, edx: bfcc2208 esi: 0000000f, edi: 00000000, ebp: bfcc21b8, esp: bfcc2188 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] data: internal, ptr = (nil), size = 4, 00000006 [01] resource lock, type = pid, access = write, id = 175, serial = 2, desc = none Replay Diverged, err = 35 (Resource deadlock avoided) RACE 1: replay deadlock

A possible deadlock shows up in dmesg (might not be relevant):

[ 393.617517] ======================================================= [ 393.621095] [ INFO: possible circular locking dependency detected ] [ 393.621095] 2.6.35-scribe+ #25 [ 393.621095] ------------------------------------------------------- [ 393.621095] killall5/1289 is trying to acquire lock: 393.621095{++++++}, at: [] lock_object+0x2a0/0xa60 [ 393.621095] [ 393.621095] but task is already holding lock: 393.621095{+.+.+.}, at: [] lock_object+0x774/0xa60 [ 393.621095] [ 393.621095] which lock already depends on the new lock. [ 393.621095] [ 393.621095] [ 393.621095] the existing dependency chain (in reverse order) is: [ 393.621095] [ 393.621095] -> #2 (SCRIBE_RES_TYPE_FILE){+.+.+.}: [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] down_write_nested+0x55/0x80 [ 393.621095] [] lock_object+0x774/0xa60 [ 393.621095] [] __scribe_lock_objects+0x5a/0x150 [ 393.621095] [] lock_file+0xf7/0x140 [ 393.621095] [] scribe_pre_fput+0xbe/0xd0 [ 393.621095] [] fput+0x1a/0x240 [ 393.621095] [] filp_close+0x4c/0x80 [ 393.621095] [] sys_close+0x96/0x140 [ 393.621095] [] setup_new_exec+0x18c/0x250 [ 393.621095] [] load_elf_binary+0x3d0/0xf20 [ 393.621095] [] search_binary_handler+0x80/0x1c0 [ 393.621095] [] do_execve+0x236/0x390 [ 393.621095] [] sys_execve+0x37/0x70 [ 393.621095] [] ptregs_execve+0x12/0x18 [ 393.621095] [ 393.621095] -> #1 (&p->cred_guard_mutex){+.+.+.}: [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] mutex_lock_common+0x4c/0x320 [ 393.621095] [] mutex_lock_interruptible_nested+0x3f/0x50 [ 393.621095] [] prepare_bprm_creds+0x71/0x130 [ 393.621095] [] do_execve+0x6d/0x390 [ 393.621095] [] sys_execve+0x37/0x70 [ 393.621095] [] ptregs_execve+0x12/0x18 [ 393.621095] [ 393.621095] -> #0 (SCRIBE_RES_TYPE_PID){++++++}: [ 393.621095] [] lock_acquire+0x1104/0x1730 [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] down_read_nested+0x55/0x70 [ 393.621095] [] lock_object+0x2a0/0xa60 [ 393.621095] [] scribe_lock_object+0xf/0x20 [ 393.621095] [] lock_pid+0x68/0x80 [ 393.621095] [] scribe_lock_pid_read+0x12/0x20 [ 393.621095] [] next_tgid+0xed/0x3d0 [ 393.621095] [] proc_pid_readdir+0x133/0x1d0 [ 393.621095] [] proc_root_readdir+0x47/0x60 [ 393.621095] [] vfs_readdir+0x86/0xa0 [ 393.621095] [] sys_getdents+0x78/0x140 [ 393.621095] [] sysenter_do_call+0x1d/0x48 [ 393.621095] [ 393.621095] other info that might help us debug this: [ 393.621095] [ 393.621095] 1 lock held by killall5/1289: [ 393.621095] #0: (SCRIBE_RES_TYPE_FILE){+.+.+.}, at: [] lock_object+0x774/0xa60 [ 393.621095] [ 393.621095] stack backtrace: [ 393.621095] Pid: 1289 (32), comm: killall5 Not tainted 2.6.35-scribe+ #25 [ 393.621095] Call Trace: [ 393.621095] [] print_circular_bug+0xc2/0xd0 [ 393.621095] [] lock_acquire+0x1104/0x1730 [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] ? lock_object+0x2a0/0xa60 [ 393.621095] [] down_read_nested+0x55/0x70 [ 393.621095] [] ? __lock_object+0x2a0/0xa60 [ 393.621095] [] lock_object+0x2a0/0xa60 [ 393.621095] [] ? mark_held_locks+0x62/0x90 [ 393.621095] [] ? local_bh_enable+0x69/0xd0 [ 393.621095] [] ? local_bh_enable+0x69/0xd0 [ 393.621095] [] ? scribe_get_mapped_res+0x31/0x1d0 [ 393.621095] [] scribe_lock_object+0xf/0x20 [ 393.621095] [] lock_pid+0x68/0x80 [ 393.621095] [] scribe_lock_pid_read+0x12/0x20 [ 393.621095] [] next_tgid+0xed/0x3d0 [ 393.621095] [] ? next_tgid+0x88/0x3d0 [ 393.621095] [] ? __mutex_unlock_slowpath+0xab/0x120 [ 393.621095] [] proc_pid_readdir+0x133/0x1d0 [ 393.621095] [] ? proc_base_instantiate+0x0/0x100 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] proc_root_readdir+0x47/0x60 [ 393.621095] [] vfs_readdir+0x86/0xa0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] sys_getdents+0x78/0x140 [ 393.621095] [] sysenter_do_call+0x1d/0x48

nviennot commented 13 years ago

The dmsg message is indeed not relevant. On Aug 19, 2011 9:15 PM, "chiache" < reply@reply.github.com> wrote:

This is the "modified replay" of upstart. The logs are located at hp05:/persisit/need-diagnose/upstart-expectfork

Original log: hp05:/persist/need-diagnose/upstart-expectfork/out.log Modified log: hp05:/persisit/need-diagnose/upstart-expectfork/out.1.log (all 4 logs reproduce the same problem)

backtrace:

INFO:root: running pre-replay callback... INFO:root: replaying ... INFO:root:timeout: 45, kill: True INFO:root:context.wait() [3] INFO:root: Backtrace (50 events): [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000008, ecx: b93f91b8, edx: 00000800 esi: b93f91b8, edi: 00000008, ebp: bfcc20d8, esp: bfcc2088 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 532, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 191, serial = 16, desc = socket:[138024] [01] resource unlock, id = 191 [01] syscall ended [01] close() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000006, ebx: 00000008, ecx: b77283c0, edx: b7784ff4 esi: 00000008, edi: 00000000, ebp: bfcc2078, esp: bfcc204c ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = write, id = 3, serial = 561, desc = none [01] resource unlock, id = 3 [01] syscall ended [01] write() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000004, ebx: 00000004, ecx: b77a1228, edx: 00000001 esi: b77a1228, edi: 00000000, ebp: bfcc2148, esp: bfcc2118 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 156, serial = 8, desc = pipe:[138019] [01] resource unlock, id = 156 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 24, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 25, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 26, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = -11 EAGAIN (Resource temporarily unavailable) [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 27, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] waitid() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 0000011c, ebx: 00000000, ecx: 00000000, edx: bfcc2208 esi: 0000000f, edi: 00000000, ebp: bfcc21b8, esp: bfcc2188 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] data: internal, ptr = (nil), size = 4, 00000006 [01] resource lock, type = pid, access = write, id = 175, serial = 2, desc = none Last events of each known process: [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000008, ecx: b93f91b8, edx: 00000800 esi: b93f91b8, edi: 00000008, ebp: bfcc20d8, esp: bfcc2088 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 532, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 191, serial = 16, desc = socket:[138024] [01] resource unlock, id = 191 [01] syscall ended [01] close() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000006, ebx: 00000008, ecx: b77283c0, edx: b7784ff4 esi: 00000008, edi: 00000000, ebp: bfcc2078, esp: bfcc204c ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = write, id = 3, serial = 561, desc = none [01] resource unlock, id = 3 [01] syscall ended [01] write() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000004, ebx: 00000004, ecx: b77a1228, edx: 00000001 esi: b77a1228, edi: 00000000, ebp: bfcc2148, esp: bfcc2118 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 156, serial = 8, desc = pipe:[138019] [01] resource unlock, id = 156 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 24, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 25, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 26, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = -11 EAGAIN (Resource temporarily unavailable) [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 27, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] waitid() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 0000011c, ebx: 00000000, ecx: 00000000, edx: bfcc2208 esi: 0000000f, edi: 00000000, ebp: bfcc21b8, esp: bfcc2188 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] data: internal, ptr = (nil), size = 4, 00000006 [01] resource lock, type = pid, access = write, id = 175, serial = 2, desc = none Replay Diverged, err = 35 (Resource deadlock avoided) RACE 1: replay deadlock

A possible deadlock shows up in dmesg (might not be relevant):

[ 393.617517] ======================================================= [ 393.621095] [ INFO: possible circular locking dependency detected ] [ 393.621095] 2.6.35-scribe+ #25 [ 393.621095] ------------------------------------------------------- [ 393.621095] killall5/1289 is trying to acquire lock: 393.621095{++++++}, at: [] lock_object+0x2a0/0xa60 [ 393.621095] [ 393.621095] but task is already holding lock: 393.621095{+.+.+.}, at: [] lock_object+0x774/0xa60 [ 393.621095] [ 393.621095] which lock already depends on the new lock. [ 393.621095] [ 393.621095] [ 393.621095] the existing dependency chain (in reverse order) is: [ 393.621095] [ 393.621095] -> #2 (SCRIBE_RES_TYPE_FILE){+.+.+.}: [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] down_write_nested+0x55/0x80 [ 393.621095] [] lock_object+0x774/0xa60 [ 393.621095] [] __scribe_lock_objects+0x5a/0x150 [ 393.621095] [] lock_file+0xf7/0x140 [ 393.621095] [] scribe_pre_fput+0xbe/0xd0 [ 393.621095] [] fput+0x1a/0x240 [ 393.621095] [] filp_close+0x4c/0x80 [ 393.621095] [] sys_close+0x96/0x140 [ 393.621095] [] setup_new_exec+0x18c/0x250 [ 393.621095] [] load_elf_binary+0x3d0/0xf20 [ 393.621095] [] search_binary_handler+0x80/0x1c0 [ 393.621095] [] do_execve+0x236/0x390 [ 393.621095] [] sys_execve+0x37/0x70 [ 393.621095] [] ptregs_execve+0x12/0x18 [ 393.621095] [ 393.621095] -> #1 (&p->cred_guard_mutex){+.+.+.}: [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] mutex_lock_common+0x4c/0x320 [ 393.621095] [] mutex_lock_interruptible_nested+0x3f/0x50 [ 393.621095] [] prepare_bprm_creds+0x71/0x130 [ 393.621095] [] do_execve+0x6d/0x390 [ 393.621095] [] sys_execve+0x37/0x70 [ 393.621095] [] ptregs_execve+0x12/0x18 [ 393.621095] [ 393.621095] -> #0 (SCRIBE_RES_TYPE_PID){++++++}: [ 393.621095] [] lock_acquire+0x1104/0x1730 [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] down_read_nested+0x55/0x70 [ 393.621095] [] lock_object+0x2a0/0xa60 [ 393.621095] [] scribe_lock_object+0xf/0x20 [ 393.621095] [] lock_pid+0x68/0x80 [ 393.621095] [] scribe_lock_pid_read+0x12/0x20 [ 393.621095] [] next_tgid+0xed/0x3d0 [ 393.621095] [] proc_pid_readdir+0x133/0x1d0 [ 393.621095] [] proc_root_readdir+0x47/0x60 [ 393.621095] [] vfs_readdir+0x86/0xa0 [ 393.621095] [] sys_getdents+0x78/0x140 [ 393.621095] [] sysenter_do_call+0x1d/0x48 [ 393.621095] [ 393.621095] other info that might help us debug this: [ 393.621095] [ 393.621095] 1 lock held by killall5/1289: [ 393.621095] #0: (SCRIBE_RES_TYPE_FILE){+.+.+.}, at: [] lock_object+0x774/0xa60 [ 393.621095] [ 393.621095] stack backtrace: [ 393.621095] Pid: 1289 (32), comm: killall5 Not tainted 2.6.35-scribe+

25

[ 393.621095] Call Trace: [ 393.621095] [] print_circular_bug+0xc2/0xd0 [ 393.621095] [] lock_acquire+0x1104/0x1730 [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] ? lock_object+0x2a0/0xa60 [ 393.621095] [] down_read_nested+0x55/0x70 [ 393.621095] [] ? __lock_object+0x2a0/0xa60 [ 393.621095] [] lock_object+0x2a0/0xa60 [ 393.621095] [] ? mark_held_locks+0x62/0x90 [ 393.621095] [] ? local_bh_enable+0x69/0xd0 [ 393.621095] [] ? local_bh_enable+0x69/0xd0 [ 393.621095] [] ? scribe_get_mapped_res+0x31/0x1d0 [ 393.621095] [] scribe_lock_object+0xf/0x20 [ 393.621095] [] lock_pid+0x68/0x80 [ 393.621095] [] scribe_lock_pid_read+0x12/0x20 [ 393.621095] [] next_tgid+0xed/0x3d0 [ 393.621095] [] ? next_tgid+0x88/0x3d0 [ 393.621095] [] ? __mutex_unlock_slowpath+0xab/0x120 [ 393.621095] [] proc_pid_readdir+0x133/0x1d0 [ 393.621095] [] ? proc_base_instantiate+0x0/0x100 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] proc_root_readdir+0x47/0x60 [ 393.621095] [] vfs_readdir+0x86/0xa0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] sys_getdents+0x78/0x140 [ 393.621095] [] sysenter_do_call+0x1d/0x48

Reply to this email directly or view it on GitHub: https://github.com/columbia/racepro/issues/10

nviennot commented 13 years ago

About the trace I cannot really do anything with it since its a modified log. The problem is probably the log file On Aug 19, 2011 9:21 PM, "Nicolas Viennot" nicolas@viennot.biz wrote:

The dmsg message is indeed not relevant. On Aug 19, 2011 9:15 PM, "chiache" < reply@reply.github.com> wrote:

This is the "modified replay" of upstart. The logs are located at hp05:/persisit/need-diagnose/upstart-expectfork

Original log: hp05:/persist/need-diagnose/upstart-expectfork/out.log Modified log: hp05:/persisit/need-diagnose/upstart-expectfork/out.1.log (all 4 logs reproduce the same problem)

backtrace:

INFO:root: running pre-replay callback... INFO:root: replaying ... INFO:root:timeout: 45, kill: True INFO:root:context.wait() [3] INFO:root: Backtrace (50 events): [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000008, ecx: b93f91b8, edx: 00000800 esi: b93f91b8, edi: 00000008, ebp: bfcc20d8, esp: bfcc2088 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 532, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 191, serial = 16, desc = socket:[138024] [01] resource unlock, id = 191 [01] syscall ended [01] close() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000006, ebx: 00000008, ecx: b77283c0, edx: b7784ff4 esi: 00000008, edi: 00000000, ebp: bfcc2078, esp: bfcc204c ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = write, id = 3, serial = 561, desc = none [01] resource unlock, id = 3 [01] syscall ended [01] write() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000004, ebx: 00000004, ecx: b77a1228, edx: 00000001 esi: b77a1228, edi: 00000000, ebp: bfcc2148, esp: bfcc2118 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 156, serial = 8, desc = pipe:[138019] [01] resource unlock, id = 156 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 24, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 25, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 26, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = -11 EAGAIN (Resource temporarily unavailable) [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 27, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] waitid() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 0000011c, ebx: 00000000, ecx: 00000000, edx: bfcc2208 esi: 0000000f, edi: 00000000, ebp: bfcc21b8, esp: bfcc2188 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] data: internal, ptr = (nil), size = 4, 00000006 [01] resource lock, type = pid, access = write, id = 175, serial = 2, desc = none Last events of each known process: [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000008, ecx: b93f91b8, edx: 00000800 esi: b93f91b8, edi: 00000008, ebp: bfcc20d8, esp: bfcc2088 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 532, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 191, serial = 16, desc = socket:[138024] [01] resource unlock, id = 191 [01] syscall ended [01] close() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000006, ebx: 00000008, ecx: b77283c0, edx: b7784ff4 esi: 00000008, edi: 00000000, ebp: bfcc2078, esp: bfcc204c ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = write, id = 3, serial = 561, desc = none [01] resource unlock, id = 3 [01] syscall ended [01] write() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000004, ebx: 00000004, ecx: b77a1228, edx: 00000001 esi: b77a1228, edi: 00000000, ebp: bfcc2148, esp: bfcc2118 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 156, serial = 8, desc = pipe:[138019] [01] resource unlock, id = 156 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 24, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 25, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = 1 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 26, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] read() = -11 EAGAIN (Resource temporarily unavailable) [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 00000003, ebx: 00000003, ecx: bfcc248f, edx: 00000001 esi: bfcc248f, edi: bfcc2370, ebp: bfcc24a8, esp: bfcc2288 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] resource lock, type = files_struct, access = read, id = 3, serial = 562, desc = none [01] resource unlock, id = 3 [01] resource lock, type = file, access = write, id = 155, serial = 27, desc = pipe:[138019] [01] resource unlock, id = 155 [01] syscall ended [01] waitid() = 0 [01] regs: eip: 0073:b77ad424, eflags: 00000246, eax: 0000011c, ebx: 00000000, ecx: 00000000, edx: bfcc2208 esi: 0000000f, edi: 00000000, ebp: bfcc21b8, esp: bfcc2188 ds: 007b, es: 007b, fs: 0000, gs: 0033, ss: 007b [01] data: internal, ptr = (nil), size = 4, 00000006 [01] resource lock, type = pid, access = write, id = 175, serial = 2, desc = none Replay Diverged, err = 35 (Resource deadlock avoided) RACE 1: replay deadlock

A possible deadlock shows up in dmesg (might not be relevant):

[ 393.617517] ======================================================= [ 393.621095] [ INFO: possible circular locking dependency detected ] [ 393.621095] 2.6.35-scribe+ #25 [ 393.621095] ------------------------------------------------------- [ 393.621095] killall5/1289 is trying to acquire lock: 393.621095{++++++}, at: [] lock_object+0x2a0/0xa60 [ 393.621095] [ 393.621095] but task is already holding lock: 393.621095{+.+.+.}, at: [] lock_object+0x774/0xa60 [ 393.621095] [ 393.621095] which lock already depends on the new lock. [ 393.621095] [ 393.621095] [ 393.621095] the existing dependency chain (in reverse order) is: [ 393.621095] [ 393.621095] -> #2 (SCRIBE_RES_TYPE_FILE){+.+.+.}: [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] down_write_nested+0x55/0x80 [ 393.621095] [] lock_object+0x774/0xa60 [ 393.621095] [] __scribe_lock_objects+0x5a/0x150 [ 393.621095] [] lock_file+0xf7/0x140 [ 393.621095] [] scribe_pre_fput+0xbe/0xd0 [ 393.621095] [] fput+0x1a/0x240 [ 393.621095] [] filp_close+0x4c/0x80 [ 393.621095] [] sys_close+0x96/0x140 [ 393.621095] [] setup_new_exec+0x18c/0x250 [ 393.621095] [] load_elf_binary+0x3d0/0xf20 [ 393.621095] [] search_binary_handler+0x80/0x1c0 [ 393.621095] [] do_execve+0x236/0x390 [ 393.621095] [] sys_execve+0x37/0x70 [ 393.621095] [] ptregs_execve+0x12/0x18 [ 393.621095] [ 393.621095] -> #1 (&p->cred_guard_mutex){+.+.+.}: [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] mutex_lock_common+0x4c/0x320 [ 393.621095] [] mutex_lock_interruptible_nested+0x3f/0x50 [ 393.621095] [] prepare_bprm_creds+0x71/0x130 [ 393.621095] [] do_execve+0x6d/0x390 [ 393.621095] [] sys_execve+0x37/0x70 [ 393.621095] [] ptregs_execve+0x12/0x18 [ 393.621095] [ 393.621095] -> #0 (SCRIBE_RES_TYPE_PID){++++++}: [ 393.621095] [] lock_acquire+0x1104/0x1730 [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] down_read_nested+0x55/0x70 [ 393.621095] [] lock_object+0x2a0/0xa60 [ 393.621095] [] scribe_lock_object+0xf/0x20 [ 393.621095] [] lock_pid+0x68/0x80 [ 393.621095] [] scribe_lock_pid_read+0x12/0x20 [ 393.621095] [] next_tgid+0xed/0x3d0 [ 393.621095] [] proc_pid_readdir+0x133/0x1d0 [ 393.621095] [] proc_root_readdir+0x47/0x60 [ 393.621095] [] vfs_readdir+0x86/0xa0 [ 393.621095] [] sys_getdents+0x78/0x140 [ 393.621095] [] sysenter_do_call+0x1d/0x48 [ 393.621095] [ 393.621095] other info that might help us debug this: [ 393.621095] [ 393.621095] 1 lock held by killall5/1289: [ 393.621095] #0: (SCRIBE_RES_TYPE_FILE){+.+.+.}, at: [] lock_object+0x774/0xa60 [ 393.621095] [ 393.621095] stack backtrace: [ 393.621095] Pid: 1289 (32), comm: killall5 Not tainted 2.6.35-scribe+

25

[ 393.621095] Call Trace: [ 393.621095] [] print_circular_bug+0xc2/0xd0 [ 393.621095] [] lock_acquire+0x1104/0x1730 [ 393.621095] [] lock_acquire+0x89/0x100 [ 393.621095] [] ? lock_object+0x2a0/0xa60 [ 393.621095] [] down_read_nested+0x55/0x70 [ 393.621095] [] ? __lock_object+0x2a0/0xa60 [ 393.621095] [] lock_object+0x2a0/0xa60 [ 393.621095] [] ? mark_held_locks+0x62/0x90 [ 393.621095] [] ? local_bh_enable+0x69/0xd0 [ 393.621095] [] ? local_bh_enable+0x69/0xd0 [ 393.621095] [] ? scribe_get_mapped_res+0x31/0x1d0 [ 393.621095] [] scribe_lock_object+0xf/0x20 [ 393.621095] [] lock_pid+0x68/0x80 [ 393.621095] [] scribe_lock_pid_read+0x12/0x20 [ 393.621095] [] next_tgid+0xed/0x3d0 [ 393.621095] [] ? next_tgid+0x88/0x3d0 [ 393.621095] [] ? __mutex_unlock_slowpath+0xab/0x120 [ 393.621095] [] proc_pid_readdir+0x133/0x1d0 [ 393.621095] [] ? proc_base_instantiate+0x0/0x100 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] proc_root_readdir+0x47/0x60 [ 393.621095] [] vfs_readdir+0x86/0xa0 [ 393.621095] [] ? filldir+0x0/0x2a0 [ 393.621095] [] sys_getdents+0x78/0x140 [ 393.621095] [] sysenter_do_call+0x1d/0x48

Reply to this email directly or view it on GitHub: https://github.com/columbia/racepro/issues/10

nviennot commented 13 years ago

waitid() has a different semantic than waitpid(). This is fixed.