Closed sjorge closed 4 years ago
Looks like the entire pool is corrupt, it seems to be stuck at the pool importing step. It was my data pool so it's not zones. It might make it into smartos to collect some more info.
Edit: pulled all the disks of the data pool, hit smf imported step... Edit2: running scrub now, re inserted the disks and zpool import found the pool...
Script I can use to trigger the vmdump:
## archive
SHARE=/archive
# set owner
/bin/chown -Rf nobody:media_rw ${SHARE}
/bin/chmod 0070 ${SHARE}
/bin/chmod g+s ${SHARE}
/bin/find ${SHARE}/ -type d -mindepth 1 -maxdepth 3 \
-not -path "${SHARE}/.\$EXTEND*" \
-not -path "${SHARE}/.zfs*" \
-exec /bin/chmod g+s {} \+
# cleanup existing acl
/bin/chmod A- ${SHARE}
/bin/chmod A3- ${SHARE}
/bin/chmod A2- ${SHARE}
/bin/chmod A1- ${SHARE}
# set base acl
/bin/chmod A0=everyone@:------a-R-c--s:fd-----:allow ${SHARE}
/bin/chmod A+owner@:------a-R-c--s:fd-----:allow ${SHARE}
/bin/chmod A+group@:r-----a-R-c--s:f------:allow ${SHARE}
/bin/chmod A+group@:r-x---a-R-c--s:-d-----:allow ${SHARE}
# set extended acl
/bin/chmod A+group:media_ro:r-----a-R-c--s:f------:allow ${SHARE}
/bin/chmod A+group:media_ro:r-x---a-R-c--s:-d-----:allow ${SHARE}
## music
for SHARE in /archive/movies /archive/music; do
# cleanup existing acl
/bin/chmod A- ${SHARE}
/bin/chmod A3- ${SHARE}
/bin/chmod A2- ${SHARE}
/bin/chmod A1- ${SHARE}
# set base acl
/bin/chmod A0=everyone@:------a-R-c--s:fd-----:allow ${SHARE}
/bin/chmod A+owner@:------a-R-c--s:fd-----:allow ${SHARE}
/bin/chmod A+group@:rw-pdDaARWcCos:f------:allow ${SHARE}
/bin/chmod A+group@:rwxpdDaARWcCos:-d-----:allow ${SHARE}
# set extended acl
/bin/chmod A+group:media_ro:r-----a-R-c--s:f------:allow ${SHARE}
/bin/chmod A+group:media_ro:r-x---a-R-c--s:-d-----:allow ${SHARE}
# propogate acl
/bin/mkdir ${SHARE}/.zfs_acl_dir
/bin/find ${SHARE}/ -type d -mindepth 1 \
-not -path "${SHARE}/.\$EXTEND*" \
-not -path "${SHARE}/.zfs*" \
-exec cpacl ${SHARE}/.zfs_acl_dir {} \+
/bin/rmdir ${SHARE}/.zfs_acl_dir
/bin/touch ${SHARE}/.zfs_acl_file
/bin/find ${SHARE}/ -type f -mindepth 1 \
-not -path "${SHARE}/.\$EXTEND*" \
-not -path "${SHARE}/.zfs*" \
-exec cpacl ${SHARE}/.zfs_acl_file {} \+
/bin/rm ${SHARE}/.zfs_acl_file
done
Stack from the dump, size ~ 2G upload will take most of the night
savecore: 2019-10-20T20:29:25.766090+00:00 carbon savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe00f9157ec0 addr=1 occurred in module "<unknown>" due to a NULL pointer dereference
System dump time: Sun Oct 20 20:19:06 2019
[root@carbon /var/crash/volatile]# mdb -k unix.1 vmcore.1
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs mm sd lofs idm mpt_sas sata random cpc logindmux ptm sppp nfs ]
> $c
1()
sa_build_layouts+0x23d(fffffeb3ac722790, fffffeb31570b0c0, e, fffffeb3ac6e62c0)
sa_modify_attrs+0x2e8(fffffeb3ac722790, 1, 3, 1, 8, 47)
sa_attr_op+0xf3(fffffeb3ac722790, fffffe00f9158440, 8, 1, fffffeb3ac6e62c0)
sa_bulk_update_impl+0x6d(fffffeb3ac722790, fffffe00f9158440, 8, fffffeb3ac6e62c0)
sa_bulk_update+0x4d(fffffeb3ac722790, fffffe00f9158440, 8, fffffeb3ac6e62c0)
zfs_setattr_dir+0x24f(fffffeb3ac71cb28)
zfs_setattr+0x1ad6(fffffeb3a0e5b380, fffffe00f9158d90, 0, fffffeb3508420c8, 0)
fop_setattr+0x91(fffffeb3a0e5b380, fffffe00f9158d90, 0, fffffeb3508420c8, 0)
lo_setattr+0x1b(fffffeb3ac71ad40, fffffe00f9158d90, 0, fffffeb3508420c8, 0)
fop_setattr+0x91(fffffeb3ac71ad40, fffffe00f9158d90, 0, fffffeb3508420c8, 0)
fsetattrat+0x147(ffd19553, fed14042, 0, fffffe00f9158d90)
fchownat+0xc8(ffd19553, fed14042, ea61, 8a3, 0)
chown+0x1f(fed14042, ea61, 8a3)
_sys_sysenter_post_swapgs+0x159()
Looks like it will probably be upstream too from the stack, closing this and opened https://www.illumos.org/issues/11856
Will upload the dump later, not sure if smartos specific or illumos general. I was running chown -Rf as a first step to reset some messed up ACLs and the box is vmdump now.