bparli / fastping-rs

ICMP ping library in Rust inspired by go-fastping and AnyEvent::FastPing Perl module
MIT License
78 stars 12 forks source link

Reduce jitter and wakeups in ping receiver loop #36

Closed rcloran closed 1 year ago

rcloran commented 1 year ago

The ping loop has a recv_timeout() with a timeout of 100ms, and separately checks whether the deadline has passed. This causes some minor problems:

In a client application, I logged the difference between arrival times of pings to one host on my local network (while also pinging a few others):

$ sudo RUST_LOG=info cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 1.73s
     Running `target/debug/foo`
[2023-09-02T20:49:43.436025Z INFO  foo] 771.542µs since last response from 192.168.8.1
[2023-09-02T20:49:45.536952Z INFO  foo] 2.100864042s since last response from 192.168.8.1
[2023-09-02T20:49:47.633193Z INFO  foo] 2.096031417s since last response from 192.168.8.1
[2023-09-02T20:49:49.638436Z INFO  foo] 2.005040959s since last response from 192.168.8.1
[2023-09-02T20:49:51.726829Z INFO  foo] 2.088198875s since last response from 192.168.8.1
[2023-09-02T20:49:53.735712Z INFO  foo] 2.008684584s since last response from 192.168.8.1
[2023-09-02T20:49:55.737953Z INFO  foo] 2.002044333s since last response from 192.168.8.1
[2023-09-02T20:49:57.843609Z INFO  foo] 2.10544825s since last response from 192.168.8.1
[2023-09-02T20:49:59.935109Z INFO  foo] 2.091310541s since last response from 192.168.8.1
[2023-09-02T20:50:01.942223Z INFO  foo] 2.007054s since last response from 192.168.8.1
[2023-09-02T20:50:04.032410Z INFO  foo] 2.09012125s since last response from 192.168.8.1
[2023-09-02T20:50:06.118306Z INFO  foo] 2.085717708s since last response from 192.168.8.1
[2023-09-02T20:50:08.218496Z INFO  foo] 2.099996958s since last response from 192.168.8.1
[2023-09-02T20:50:10.321198Z INFO  foo] 2.102489583s since last response from 192.168.8.1
[2023-09-02T20:50:12.419079Z INFO  foo] 2.097679791s since last response from 192.168.8.1

This change uses an alternative method, allowing timeout to be up to the maximum amount of time remaining. This reduces the jitter substantially:

$ sudo RUST_LOG=info cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/foo`
[2023-09-02T20:48:00.381874Z INFO  foo] Starting WS
[2023-09-02T20:48:00.381880Z INFO  foo] 641.625µs since last response from 192.168.8.1
[2023-09-02T20:48:02.387631Z INFO  foo] 2.005641916s since last response from 192.168.8.1
[2023-09-02T20:48:04.392757Z INFO  foo] 2.004919583s since last response from 192.168.8.1
[2023-09-02T20:48:06.397968Z INFO  foo] 2.004992083s since last response from 192.168.8.1
[2023-09-02T20:48:08.403081Z INFO  foo] 2.004902667s since last response from 192.168.8.1
[2023-09-02T20:48:10.407966Z INFO  foo] 2.004678458s since last response from 192.168.8.1
[2023-09-02T20:48:12.413226Z INFO  foo] 2.005053958s since last response from 192.168.8.1
[2023-09-02T20:48:14.418228Z INFO  foo] 2.004789291s since last response from 192.168.8.1
[2023-09-02T20:48:16.423268Z INFO  foo] 2.004827833s since last response from 192.168.8.1
[2023-09-02T20:48:18.428643Z INFO  foo] 2.005167s since last response from 192.168.8.1
[2023-09-02T20:48:20.433848Z INFO  foo] 2.004990292s since last response from 192.168.8.1
[2023-09-02T20:48:22.436062Z INFO  foo] 2.001999583s since last response from 192.168.8.1
[2023-09-02T20:48:24.442442Z INFO  foo] 2.006172375s since last response from 192.168.8.1
[2023-09-02T20:48:26.447208Z INFO  foo] 2.004584958s since last response from 192.168.8.1
[2023-09-02T20:48:28.451690Z INFO  foo] 2.004285s since last response from 192.168.8.1
rcloran commented 1 year ago

Since this also moves the start time timer to the top of the loop, before the echos are sent, it'll end up reducing the actual max_rtt -- though this is probably only meaningful when pinging a very large number of hosts. If that's not desirable, that might make the goal of reducing slew impossible, but the reduced wakeups might still be worthwhile.

rcloran commented 1 year ago

After toying with this a bit and thinking about it, I don't think the goal of getting the slew fixed really fits here. So I'll update with a simplified PR that just reduces jitter and wakeups.

bparli commented 1 year ago

After toying with this a bit and thinking about it, I don't think the goal of getting the slew fixed really fits here. So I'll update with a simplified PR that just reduces jitter and wakeups.

Right, I don't think we want to adjust slew here for this case. Addressing jitter seems reasonable to me

rcloran commented 1 year ago

Playing around some more ... there's another potential downside to this. It now reads Instant::now() twice per packet received; once for the RTT and once for the timeout. AIUI, this is a system call each time (I'll check that at some point), which could be a performance problem for a large number of targets.

An alternative is to use the RTT from the listener. The downside of that is that it doesn't take time taken for the message to get through the thread channel into account.

This may be a non-issue. I've been trying to benchmark against a small number of hosts on my home network, and see changes, but they are probably within noise as I haven't been super careful about how I'm performing the measurements.

Happy for you to let this PR linger until I'm a bit more clear on the effects of it, or, if you have opinions I'd be happy to listen.

rcloran commented 1 year ago

This may be a non-issue. I've been trying to benchmark against a small number of hosts on my home network, and see changes, but they are probably within noise as I haven't been super careful about how I'm performing the measurements.

I reworked the bench setup to use criterion and cargo-flamegraph, pinging 127.0.0.{1..255}, and seem to be getting reliable (enough) results. In this setup the time is heavily dominated by time in sendto(2). The only samples I could find to do with time (clock_gettime(3) and mach_absolute_time) were about 0.03% of total time spent in the program, though that includes setup (which seems to take a considerable fraction).

I think performance of this patch is a non-issue.

bparli commented 1 year ago

This looks good and reducing the jitter makes sense to me.

I think performance of this patch is a non-issue.

I agree, but I don't think we need to worry about performance hit in this case. IIRC clock_gettime uses vDSO so not a system call strictly speaking