TritonDataCenter / smartos-live

For more information, please see http://smartos.org/ For any questions that aren't answered there, please join the SmartOS discussion list: https://smartos.topicbox.com/groups/smartos-discuss
1.57k stars 246 forks source link

chown -Rf on zfs triggers vmdump #854

Closed sjorge closed 4 years ago

sjorge commented 4 years ago

Will upload the dump later, not sure if smartos specific or illumos general. I was running chown -Rf as a first step to reset some messed up ACLs and the box is vmdump now.

sjorge commented 4 years ago

Looks like the entire pool is corrupt, it seems to be stuck at the pool importing step. It was my data pool so it's not zones. It might make it into smartos to collect some more info.

Edit: pulled all the disks of the data pool, hit smf imported step... Edit2: running scrub now, re inserted the disks and zpool import found the pool...

sjorge commented 4 years ago

Script I can use to trigger the vmdump:

## archive
SHARE=/archive
# set owner
/bin/chown -Rf nobody:media_rw ${SHARE}
/bin/chmod 0070 ${SHARE}
/bin/chmod g+s ${SHARE}
/bin/find ${SHARE}/ -type d -mindepth 1 -maxdepth 3 \
  -not -path "${SHARE}/.\$EXTEND*" \
  -not -path "${SHARE}/.zfs*" \
  -exec /bin/chmod g+s {} \+
# cleanup existing acl
/bin/chmod A- ${SHARE}
/bin/chmod A3- ${SHARE}
/bin/chmod A2- ${SHARE}
/bin/chmod A1- ${SHARE}
# set base acl
/bin/chmod A0=everyone@:------a-R-c--s:fd-----:allow ${SHARE}
/bin/chmod A+owner@:------a-R-c--s:fd-----:allow ${SHARE}
/bin/chmod A+group@:r-----a-R-c--s:f------:allow ${SHARE}
/bin/chmod A+group@:r-x---a-R-c--s:-d-----:allow ${SHARE}
# set extended  acl
/bin/chmod A+group:media_ro:r-----a-R-c--s:f------:allow ${SHARE}
/bin/chmod A+group:media_ro:r-x---a-R-c--s:-d-----:allow ${SHARE}

## music
for SHARE in /archive/movies /archive/music; do
  # cleanup existing acl
  /bin/chmod A- ${SHARE}
  /bin/chmod A3- ${SHARE}
  /bin/chmod A2- ${SHARE}
  /bin/chmod A1- ${SHARE}
  # set base acl
  /bin/chmod A0=everyone@:------a-R-c--s:fd-----:allow ${SHARE}
  /bin/chmod A+owner@:------a-R-c--s:fd-----:allow ${SHARE}
  /bin/chmod A+group@:rw-pdDaARWcCos:f------:allow ${SHARE}
  /bin/chmod A+group@:rwxpdDaARWcCos:-d-----:allow ${SHARE}
  # set extended  acl
  /bin/chmod A+group:media_ro:r-----a-R-c--s:f------:allow ${SHARE}
  /bin/chmod A+group:media_ro:r-x---a-R-c--s:-d-----:allow ${SHARE}
  # propogate acl
  /bin/mkdir ${SHARE}/.zfs_acl_dir
  /bin/find ${SHARE}/ -type d -mindepth 1 \
    -not -path "${SHARE}/.\$EXTEND*" \
    -not -path "${SHARE}/.zfs*" \
    -exec cpacl ${SHARE}/.zfs_acl_dir {} \+
  /bin/rmdir ${SHARE}/.zfs_acl_dir
  /bin/touch ${SHARE}/.zfs_acl_file
  /bin/find ${SHARE}/ -type f -mindepth 1 \
    -not -path "${SHARE}/.\$EXTEND*" \
    -not -path "${SHARE}/.zfs*" \
    -exec cpacl ${SHARE}/.zfs_acl_file {} \+
 /bin/rm ${SHARE}/.zfs_acl_file
done

Stack from the dump, size ~ 2G upload will take most of the night

savecore: 2019-10-20T20:29:25.766090+00:00 carbon savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe00f9157ec0 addr=1 occurred in module "<unknown>" due to a NULL pointer dereference
System dump time: Sun Oct 20 20:19:06 2019
[root@carbon /var/crash/volatile]# mdb -k unix.1 vmcore.1
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs mm sd lofs idm mpt_sas sata random cpc logindmux ptm sppp nfs ]
> $c
1()
sa_build_layouts+0x23d(fffffeb3ac722790, fffffeb31570b0c0, e, fffffeb3ac6e62c0)
sa_modify_attrs+0x2e8(fffffeb3ac722790, 1, 3, 1, 8, 47)
sa_attr_op+0xf3(fffffeb3ac722790, fffffe00f9158440, 8, 1, fffffeb3ac6e62c0)
sa_bulk_update_impl+0x6d(fffffeb3ac722790, fffffe00f9158440, 8, fffffeb3ac6e62c0)
sa_bulk_update+0x4d(fffffeb3ac722790, fffffe00f9158440, 8, fffffeb3ac6e62c0)
zfs_setattr_dir+0x24f(fffffeb3ac71cb28)
zfs_setattr+0x1ad6(fffffeb3a0e5b380, fffffe00f9158d90, 0, fffffeb3508420c8, 0)
fop_setattr+0x91(fffffeb3a0e5b380, fffffe00f9158d90, 0, fffffeb3508420c8, 0)
lo_setattr+0x1b(fffffeb3ac71ad40, fffffe00f9158d90, 0, fffffeb3508420c8, 0)
fop_setattr+0x91(fffffeb3ac71ad40, fffffe00f9158d90, 0, fffffeb3508420c8, 0)
fsetattrat+0x147(ffd19553, fed14042, 0, fffffe00f9158d90)
fchownat+0xc8(ffd19553, fed14042, ea61, 8a3, 0)
chown+0x1f(fed14042, ea61, 8a3)
_sys_sysenter_post_swapgs+0x159()
sjorge commented 4 years ago

Looks like it will probably be upstream too from the stack, closing this and opened https://www.illumos.org/issues/11856