ewwhite / zfs-ha

ZFS High-Availability NAS
749 stars 76 forks source link

PCS cannot unmount file system during failover event #16

Closed intentions closed 6 years ago

intentions commented 6 years ago

While migrating data onto my new zfs system I attempted a failover to do some work on one of the heads. The process failed, with pcs being unable to unmount the zfs file system. I tried unmounting by hand and was told

root@scifs1701:~] zpool export -f expphyvol umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) cannot unmount '/expphyvol/hallc': umount failed root@scifs1701:~] umount -f /expphyvol/hallc/ umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1))

looking at pcs this was after the IP had been shut down, so I don't know how new writes could be coming to the device.

ewwhite commented 6 years ago

Did you check the output of lsof /expphyvol/hallc ?

intentions commented 6 years ago

the lsof returns nothing.

I asked about this on the zfs mailer and I got one response of "yea it happens sometimes", so I'm guessing it isn't a problem with the pacemaker setup

colttt commented 6 years ago

Hello,

I've the same issue, stop the nfs-server before you export the zfs pool, and start it before you import it. I was wondering why don't happen this in this how-to

intentions commented 6 years ago

Thanks, though the last time I restarted NFS all the clients yelled about stale file handlers and I had to reboot the head anyway.

I'm closing this because it now seems to be more of an issue with ZFS then what PCS is doing.

colttt commented 6 years ago

thats not an issue with ZFS! its an issue with NFS, because they dont stop the TIME_WAITS (it doesn't if the interface is down) and wait ca 2-4minutes and then stop this, you can decrease this parameters, but i don't remeber which paramters.. sorry

ewwhite commented 6 years ago

Are you using NFSv3 or NFSv4? For NFSv3, I find that it's good enough to keep the NFS daemon enabled and running on both hosts. The zpool export handles client notification, unexporting of the NFS share and the re-export all in one action.

intentions commented 6 years ago

nfs v3

During my initial testing (10 odd clients) I didn't see any problems, but once the system entered production use (~900 clients) I started seeing this problem.

colttt commented 6 years ago

we use nfs v4.2 (tcp) if you use nfs v3 and 10G you have a high risk of dataloss (because its UDP).

intentions commented 6 years ago

Data is going out over 56G FDR (but the clients are all on 40G QDR), we are using tcp