ewwhite / zfs-ha

ZFS High-Availability NAS
749 stars 76 forks source link

Host-local SLOG and L2ARC #19

Closed mwpastore closed 5 years ago

mwpastore commented 6 years ago

I'm working on a zfs-ha system design that calls for NVMe devices local to each controller for the SLOG (and possibly L2ARC). I have some strategies that I think will work for a clean failover (e.g. vdev_id aliases, zpool add and remove), but I don't think they will work for a dirty failover. Specifically, I don't think I'll be able to import the pool on the failover host if the SLOG device(s) aren't available.

I'm curious if you've given any thought to this type of configuration and if you have any suggestions, or alternatively, if you have a hard "put dual-ported SAS SSDs in your JBOD for SLOG and L2ARC and don't even think twice about it" recommendation. Thanks in advance.

ewwhite commented 6 years ago

L2ARC on discrete nodes works just fine out of the box. I tend to use NVMe cards as L2ARC in the individual nodes (e.g. /dev/nvme0n1). It doesn't need to be shared.

SLOG is a different story, though. I don't have a good option for that just yet.

What are your thoughts?

mwpastore commented 6 years ago

I'm trying to mentally walk through what will happen if blue crashes and leaves uncommitted writes on its SLOG. If you can bring up green with a blank, property-formatted SLOG with the same device name, I think it will work, but you will lose those writes. Maybe that's okay. But you'd need to clear blue's SLOG device before reimporting the pool on a subsequent failover, or else risk pool corruption.

Assuming all that is accurate, we just need a way to properly format the SLOG device post-failover, pre-import. Is it as simple as a blkdiscard? Is there a zpool command to properly format a log device (besides zpool add and attach)? What if the SLOG is mirrored?

dkobras commented 6 years ago

An SLOG device has only one reason to exist: not losing (sync) writes due to a crash. In your scenario, it's ok to lose writes in this case, so why bother with an SLOG device at all? It seems to me that just setting sync=disabled on your datasets should get you close to what you're trying to achieve, but with complexity significantly reduced.

mwpastore commented 6 years ago

@dkobras At the risk of being wrong on the internet, doesn't an SLOG device also speed up sync writes?

ewwhite commented 6 years ago

The SLOG doesn't necessarily speed them up. It's there to ensure that those writes are on protected stable storage and acknowledged at low latency.

ewwhite commented 5 years ago

Updating... I suppose you have an option to add a pre-import script that does what you need to SLOG devices.

See: https://github.com/skiselkov/stmf-ha/blob/master/heartbeat/zfs-helper and modify the script to do what you'd like for pre/post import/export.