Open stevenwilliamson opened 9 years ago
Thanks for reporting this.
The kernel type for a mutex isn't a mutex_t, but is actually a kmutex_t. However, mdb also has a built in dcmd, so you should be able to take that address you have for a lock and run ::mutex
on it, which will tell you more information about it. Though the spa_name member looks a little suspect there. What it be possible to make the dump available, we should be able to dig in a bit more and get a bit more information about where the spa went wrong.
Other things you can do is use the ::whatis
dcmd on the address o the spa_t. I suspect that it might be garbage or have been freed for some reason based on the name.
From consulting with some folks in #illumos, it appears that this is the same issue as discussed in this thread: http://comments.gmane.org/gmane.os.illumos.devel/21354
OK ::whatis on the spa_t only yields: ffffff4447a3e9e0 is freed from anon_cache
I'm guessing this means another kernel thread freed the memory.
Running ::mutex on the mutex that we panicked on shows
ffffff4447a3f0e0::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff4447a3f0e0 adapt 1fffffe888f47000 - - no
But im guessing this is just old state given the spa_t has been freed and is of no help.
It sounds like it may well be the bug you have linked to then. Is the patch mentioned in the post something that could be pulled into Illumos ?.
If there is any doubt it is this issue i can still upload the dump somewhere.
Just had another crash looks like a repeat of this issue. Disk was pulled and system crashed.
[root@phy0-sw1-ash (ash) /var/crash/volatile]# mdb -k unix.0 vmcore.0
mdb: warning: dump is from SunOS 5.11 joyent_20150622T171240Z; dcmds and macros may not match kernel implementation
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs mm sd lofs idm mpt_sas crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ]
> $C
ffffff01ea3216c0 vdev_disk_off_notify+0x45(ffffff4320864588, fffffffffbc997c0, ffffff431f2469c0, 0)
ffffff01ea321750 ldi_invoke_notify+0x1b1(ffffff431edd8010, fffffffffffffffe, 0, fffffffffbb5b228, 0)
ffffff01ea321790 e_ddi_offline_notify+0x75(ffffff431edd8010)
ffffff01ea321800 devi_detach_node+0x52(ffffff431edd8010, 41000)
ffffff01ea321870 ndi_devi_offline+0x89(ffffff431edd8010, 1000)
ffffff01ea321900 i_mdi_pi_state_change+0x511(ffffff431ebf4140, 4, 0)
ffffff01ea321940 mdi_pi_offline+0x39(ffffff431ebf4140, 0)
ffffff01ea3219a0 mptsas_offline_lun+0x71(ffffff431edd8560, 0, ffffff431ebf4140, 1)
ffffff01ea321a00 mptsas_offline_target+0x185(ffffff431edd8560, ffffff44b7383900)
ffffff01ea321af0 mptsas_handle_topo_change+0x3e9(ffffff43aa749790, ffffff431edd8560)
ffffff01ea321b60 mptsas_handle_dr+0x184(ffffff43aa749790)
ffffff01ea321c20 taskq_thread+0x2d0(ffffff431d454d68)
ffffff01ea321c30 thread_start+8()
> ::panicinfo
cpu 3
thread ffffff01ea321c40
message BAD TRAP: type=e (#pf Page fault) rp=ffffff01ea3215a0 addr=38 occurred in module "zfs" due to a NULL pointer dereference
rdi ffffff431f2469c0
rsi fffffffff7d6555e
rdx fffffffff7d6555f
rcx 0
r8 ffffff431edd8010
r9 0
rax 0
rbx ffffff431f2469c0
rbp ffffff01ea3216c0
r10 ffffff01ea321430
r11 0
r12 0
r13 0
r14 fffffffffffffffe
r15 fffffffffbb5b228
fsbase fffffd7ffc566a40
gsbase ffffff42eb8fda80
ds 4b
es 4b
fs 0
gs 0
trapno e
err 2
rip fffffffff7d39335
cs 30
rflags 10246
rsp ffffff01ea321690
ss 38
gdt_hi 0
gdt_lo 7000ffff
idt_hi 0
idt_lo 6000ffff
ldt 0
task 70
cr0 8005003b
cr2 38
cr3 12000000
cr4 426f8
> fffffffff7d39335::dis -a
fffffffff7d3930e movq %rdx,-0x18(%rbp)
fffffffff7d39312 movq %rsi,-0x10(%rbp)
fffffffff7d39316 movq 0x60(%rdx),%r12
fffffffff7d3931a call +0x3cb9d51 <ldi_ev_get_type>
fffffffff7d3931f movq $0xfffffffff7d6554d,%rsi
fffffffff7d39326 movq %rax,%rdi
fffffffff7d39329 call +0x3b4f4b2 <strcmp>
fffffffff7d3932e testl %eax,%eax
fffffffff7d39330 jne +0x35 <vdev_disk_off_notify+0x77>
fffffffff7d39332 movq %rbx,%rdi
fffffffff7d39335 movl $0x1,0x38(%r12)
fffffffff7d3933e call +0xdcd <vdev_disk_close>
fffffffff7d39343 movq 0x58(%rbx),%rdi
fffffffff7d39347 movq %rbx,%rsi
fffffffff7d3934a call -0x1614f <zfs_post_remove>
fffffffff7d3934f movq 0x58(%rbx),%rdi
fffffffff7d39353 movl $0x1,0x240(%rbx)
fffffffff7d3935d movl $0x2,%esi
fffffffff7d39362 call -0x36477 <spa_async_request>
fffffffff7d39367 xorl %eax,%eax
fffffffff7d39369 movq (%rsp),%rbx
Crashed due to null pointer, %r12 = 0 so we try to access memory address 0x38.
Tying this up with the code in vdec_disk.c it looks like this is because the dvd pointer is null/incorrect. We crash setting dvd->vd_ldi_offline = B_TRUE
if (strcmp(ldi_ev_get_type(ecookie), LDI_EV_OFFLINE) != 0)
return (LDI_EV_SUCCESS);
/*
* All LDI handles must be closed for the state change to succeed, so
* call on vdev_disk_close() to do this.
*
* We inform vdev_disk_close that it is being called from offline
* notify context so it will defer cleanup of LDI event callbacks and
* freeing of vd->vdev_tsd to the offline finalize or a reopen.
*/
dvd->vd_ldi_offline = B_TRUE;
vdev_disk_close(vd);
dvd is a vdev_tsd grabbed from a vdev_t that is passed in. Looking at the pointer passed in for vdev_t everything is null
> ffffff431f2469c0::print vdev_t
{
vdev_id = 0
vdev_guid = 0
vdev_guid_sum = 0
vdev_orig_guid = 0
vdev_asize = 0
vdev_min_asize = 0
vdev_max_asize = 0
vdev_ashift = 0
.......Snipped for clarity
Looking at the address it looks like it has been freed by another thread
> ffffff431f2469c0::whatis
ffffff431f2469c0 is freed from kmem_alloc_1600
So it appears there is a race condition it looks like something else is also calling vdev_disk_close(). But I can't see what else is calling vdev_close.
I'm not sure the patch linked to previously would resolve this specific issue, given a disk was just pulled no new disk added. I can't see why vdev_open would be been called so i don't think it's race between vdev_open/close.
@jclulow Could you take a look at this in the context of other ZFS hotplug restructuring that we've done?
Server panicked when a drive was pulled. The drive that was pulled had failed some time earlier. A new disk was inserted in to a spare bay and a "zpool replace" was used to replace the drive.
So when the drive was pulled from the server it was no longer part of the zones pool.
Below is some data from an mdb session. If required i could upload the coredump so Joyent have access.
Looking at where it actually panicked
How do we get to mutex_owner_running from mutex_enter, and how can it cause a GP fault ? mutex_owner_running does not have any arguments passed in so what populates 0xd8(%r11).
Am I barking up the wrong tree, do i need to dig this low, is it that the mutex structure is invalid that is passed in as part of the spa_async_request function call in the stack trace ?
Is that a mutex data structure and if so how do i check it's valid ?
I naively tried the below, but I'm learning MDB and Illumos internals as I go and I am probably missing a part of the picture. (I would appreciate a run down of how this is triaged if it get's looked at)