Open wiedi opened 5 years ago
Just happend again and this time I got a dump and some more clues.
> ::status
debugging crash dump vmcore.1 (64-bit) from cube-sw
operating system: 5.11 joyent_20181227T154655Z (i86pc)
image uuid: (not set)
panic message: assertion failed: epp->ep_state & UGEN_EP_STATE_INTR_IN_POLLING_ON, file: ../../common/io/usb/usba/usba_ugen.c, line: 2816
dump content: kernel pages only
> $C
fffffe000f7588c0 vpanic()
fffffe000f7588f0 0xfffffffffbe157f8()
fffffe000f758970 ugen_epx_intr_IN_req+0x160(fffffe0c800d9000, fffffe0c800d9b70, fffffe0bc8d70dd8, fffffe000f7589ac)
fffffe000f7589f0 ugen_epx_req+0x446(fffffe0c800d9000, fffffe0bc8d70dd8)
fffffe000f758a70 ugen_strategy+0x20b(fffffe0bc8d70dd8)
fffffe000f758b50 default_physio+0x33d(fffffffff7aac710, 0, fa00000004, 40, fffffffff7aac930, fffffe000f758e50)
fffffe000f758b90 physio+0x25(fffffffff7aac710, 0, fa00000004, 40, fffffffff7aac930, fffffe000f758e50)
fffffe000f758bd0 usb_ugen_read+0x68(fffffe0bc21d1008, fa00000004, fffffe000f758e50, fffffe0c3b592118)
fffffe000f758c20 ugen_skel_read+0x5e(fa00000004, fffffe000f758e50, fffffe0c3b592118)
fffffe000f758c50 cdev_read+0x2d(fa00000004, fffffe000f758e50, fffffe0c3b592118)
fffffe000f758cf0 spec_read+0x263(fffffe12dc480400, fffffe000f758e50, 0, fffffe0c3b592118, 0)
fffffe000f758d90 fop_read+0xf3(fffffe12dc480400, fffffe000f758e50, 0, fffffe0c3b592118, 0)
fffffe000f758ef0 read+0x2b8(a, fffffc7fffdffbd0, 14)
fffffe000f758f00 sys_syscall+0x2a5()
> ::panicinfo
cpu 1
thread fffffe0bcfc31080
message assertion failed: epp->ep_state & UGEN_EP_STATE_INTR_IN_POLLING_ON, file: ../../common/io/usb/usba/usba_ugen.c, line: 2816
rdi fffffffffbf66cb8
rsi fffffe000f758860
rdx fffffffff7ab99c8
rcx b00
r8 1
r9 0
rax fffffe000f758880
rbx fffffe0c800d9b70
rbp fffffe000f7588c0
r10 303d746e756f635f
r11 fffffe000f758581
r12 fffffe0bc8d70dd8
r13 fffffe0c800d9000
r14 fffffe000f7589ac
r15 0
fsbase 0
gsbase fffffe0bbacd7000
ds 4b
es 4b
fs 0
gs 1c3
trapno 0
err 0
rip fffffffffb88b4b0
cs 30
rflags 286
rsp fffffe000f758858
ss 38
gdt_hi 0
gdt_lo c00001ef
idt_hi 0
idt_lo f0000fff
ldt 0
task 70
cr0 80050033
cr2 56b010
cr3 1fa4ed000
cr4 6f8
> ::cpuinfo -v
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
0 fffffffffbc5b000 1f 5 0 59 yes no t-1 fffffe0c7e7ff440 io_ops_
| |
RUNNING <--+ +--> PRI THREAD PROC
READY 60 fffffe0010291c20 sched
QUIESCED 59 fffffe0bd0e58040 io_ops_
EXISTS 56 fffffe0c14020820 node
ENABLE 1 fffffe0c7c5fd0c0 munin-update
1 fffffe0c7c2560c0 vmadm
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
1 fffffffffbc60900 1b 3 0 54 no no t-0 fffffe0bcfc31080 apcupsd
| |
RUNNING <--+ +--> PRI THREAD PROC
READY 60 fffffe000fc51c20 sched
EXISTS 1 fffffe0c137804a0 vmadm
ENABLE 1 fffffe0bb914c060 uptime
That the panic is happening inside usba
and that apcupsd
was running suggests that this is due to one of my own experiments.
The system has an APC UPS connected via USB and delegated to a zone like this:
zonecfg -z 08f8b95c-417e-4910-f137-f25760c16b14
add device
set match=/dev/usb/51d.2/0/*
end
commit
The apcupsd
running inside that zone was installed via pkgsrc and using the default config only modified with UPSTYPE usb
.
If there is interest I can share the dump or collect more information, otherwise I'll just switch back to a non-debug kernel for now.
On 1/18/19 8:36 , Sebastian Wiedenroth wrote:
Just happend again and this time I got a dump and some more clues.
> ::status debugging crash dump vmcore.1 (64-bit) from cube-sw operating system: 5.11 joyent_20181227T154655Z (i86pc) image uuid: (not set) panic message: assertion failed: epp->ep_state & UGEN_EP_STATE_INTR_IN_POLLING_ON, file: ../../common/io/usb/usba/usba_ugen.c, line: 2816 dump content: kernel pages only > $C fffffe000f7588c0 vpanic() fffffe000f7588f0 0xfffffffffbe157f8() fffffe000f758970 ugen_epx_intr_IN_req+0x160(fffffe0c800d9000, fffffe0c800d9b70, fffffe0bc8d70dd8, fffffe000f7589ac) fffffe000f7589f0 ugen_epx_req+0x446(fffffe0c800d9000, fffffe0bc8d70dd8) fffffe000f758a70 ugen_strategy+0x20b(fffffe0bc8d70dd8) fffffe000f758b50 default_physio+0x33d(fffffffff7aac710, 0, fa00000004, 40, fffffffff7aac930, fffffe000f758e50) fffffe000f758b90 physio+0x25(fffffffff7aac710, 0, fa00000004, 40, fffffffff7aac930, fffffe000f758e50) fffffe000f758bd0 usb_ugen_read+0x68(fffffe0bc21d1008, fa00000004, fffffe000f758e50, fffffe0c3b592118) fffffe000f758c20 ugen_skel_read+0x5e(fa00000004, fffffe000f758e50, fffffe0c3b592118) fffffe000f758c50 cdev_read+0x2d(fa00000004, fffffe000f758e50, fffffe0c3b592118) fffffe000f758cf0 spec_read+0x263(fffffe12dc480400, fffffe000f758e50, 0, fffffe0c3b592118, 0) fffffe000f758d90 fop_read+0xf3(fffffe12dc480400, fffffe000f758e50, 0, fffffe0c3b592118, 0) fffffe000f758ef0 read+0x2b8(a, fffffc7fffdffbd0, 14) fffffe000f758f00 sys_syscall+0x2a5() > ::panicinfo cpu 1 thread fffffe0bcfc31080 message assertion failed: epp->ep_state & UGEN_EP_STATE_INTR_IN_POLLING_ON, file: ../../common/io/usb/usba/usba_ugen.c, line: 2816 rdi fffffffffbf66cb8 rsi fffffe000f758860 rdx fffffffff7ab99c8 rcx b00 r8 1 r9 0 rax fffffe000f758880 rbx fffffe0c800d9b70 rbp fffffe000f7588c0 r10 303d746e756f635f r11 fffffe000f758581 r12 fffffe0bc8d70dd8 r13 fffffe0c800d9000 r14 fffffe000f7589ac r15 0 fsbase 0 gsbase fffffe0bbacd7000 ds 4b es 4b fs 0 gs 1c3 trapno 0 err 0 rip fffffffffb88b4b0 cs 30 rflags 286 rsp fffffe000f758858 ss 38 gdt_hi 0 gdt_lo c00001ef idt_hi 0 idt_lo f0000fff ldt 0 task 70 cr0 80050033 cr2 56b010 cr3 1fa4ed000 cr4 6f8 > ::cpuinfo -v ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fffffffffbc5b000 1f 5 0 59 yes no t-1 fffffe0c7e7ff440 io_ops_ | | RUNNING <--+ +--> PRI THREAD PROC READY 60 fffffe0010291c20 sched QUIESCED 59 fffffe0bd0e58040 io_ops_ EXISTS 56 fffffe0c14020820 node ENABLE 1 fffffe0c7c5fd0c0 munin-update 1 fffffe0c7c2560c0 vmadm ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 fffffffffbc60900 1b 3 0 54 no no t-0 fffffe0bcfc31080 apcupsd | | RUNNING <--+ +--> PRI THREAD PROC READY 60 fffffe000fc51c20 sched EXISTS 1 fffffe0c137804a0 vmadm ENABLE 1 fffffe0bb914c060 uptime
That the panic is happening inside
usba
and thatapcupsd
was running suggests that this is due to one of my own experiments.The system has an APC UPS connected via USB and delegated to a zone like this:
zonecfg -z 08f8b95c-417e-4910-f137-f25760c16b14 add device set match=/dev/usb/51d.2/0/* end commit
The
apcupsd
running inside that zone was installed via pkgsrc and using the default config only modified withUPSTYPE usb
.If there is interest I can share the dump or collect more information, otherwise I'll just switch back to a non-debug kernel for now.
While I haven't had a chance to look at the surrounding code, I think we need to figure out whether that ASSERT is valid or not and fix accordingly.
Had the following panic:
The system was running an joyent_20181227T154655Z AMD test image. Sadly the dump could not be created. I've since resized zones/dump so maybe I’ll catch it when it happens again.