allanjude / zxfer

A continuation of development on zxfer, a popular script for managing ZFS snapshot replication
BSD 2-Clause "Simplified" License
123 stars 40 forks source link

any thoughts as to being able to throttle zxfer? #52

Open eohrnberger opened 4 years ago

eohrnberger commented 4 years ago

Hi, I'm using zxfer to backup zfs data sets from one system's zpool to another system's zpool. zxfer in this application works wonderfully.

The zxfer backup job is usually run from cron at dark AM, so I never see the resulting system load being induced on the receiving system.

Running it manually today, and I'm seeing a system load higher than 60 while the transfer is in process.

Is there a way to 'throttle' the snapshot destruction and data rate being sent from the source system to the target system, to give the target system a little time to catch up having the effect of bring the system load down a bit?

I'm already running zxfer as 'nice 19' on the sending system, but that doesn't seem to have the desired result.

Any ideas as to what I could try? Any help would be greatly appreciated.

allanjude commented 4 years ago

You can use the -D flag, which is designed for a progress bar, to also limit the throughput with an app like mbuffer or similar to introduce a rate limit.

A system load of 60 seems unusual, as ZFS is only going to be using 1 thread for send or recv, plus maybe a prefetch thread, and then ssh.

What OS? What apps were using a lot of CPU when the load was 60?

snapshot deletion will already be very slow, because it does a full txg sync after each destroy.

eohrnberger commented 4 years ago

I'm running Gentoo Linux kernel 4.19.66-gentoo zfs v0.7.13-r0-gentoo, ZFS pool version 5000, ZFS filesystem version 5 target machine is a 32 GB 4 core Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz

When it's copying data, the load seems to be between 12 and 20, spiking around 20, which I figure it OK.

eohrnberger commented 4 years ago

Thought about it some more.

I changed the script from a 'push' from source to target running on the source system into a 'pull' from source to target running on the target system, and this seems to be helping the 'nice' manage the system load better. I'm not seeing it peak and hold above 20 - 25, though there are some peaks that high, but usually holding around 15 - 20 or less. I think this is an improvement for my situation.

Thanks for listening. Sometimes it helps typing it into a web page to make the noodle work a bit more. :)

allanjude commented 4 years ago

Because it is linux, remember that diskio factors into load average, not just CPU

eohrnberger commented 4 years ago

Yeah, I have xosview running, and I see that a lot of the CPU graph lines are predominantly red, which indicates WIO, or IO Wait states. Even at high load numbers, the system was still responsive, so CPU wasn't the bottleneck. Even so, I'm inclined to 'manage' that a bit so as not to run the load so high. Just me being a worry wort I guess.