(The value of --n here is 100M, so that there are approximately 12 lines printed per second. The --k parameter is just to keep it running effectively forever.)
The result looks like this:
Block 1 sent at 1.652GB/s, received at 1.666 GB/s, average latency 3.94ms
Block 2 sent at 1.631GB/s, received at 1.625 GB/s, average latency 3.80ms
Block 3 sent at 1.594GB/s, received at 1.607 GB/s, average latency 3.65ms
Block 4 sent at 1.632GB/s, received at 1.628 GB/s, average latency 3.48ms
Block 5 sent at 1.573GB/s, received at 1.555 GB/s, average latency 3.92ms
Block 6 sent at 1.601GB/s, received at 1.622 GB/s, average latency 3.87ms
Block 7 sent at 1.586GB/s, received at 1.608 GB/s, average latency 3.03ms
Block 8 sent at 1.658GB/s, received at 1.609 GB/s, average latency 3.51ms
Block 9 sent at 1.412GB/s, received at 1.443 GB/s, average latency 3.66ms
Block 10 sent at 1.210GB/s, received at 1.201 GB/s, average latency 3.19ms
Block 11 sent at 1.198GB/s, received at 1.201 GB/s, average latency 3.37ms
Block 12 sent at 1.195GB/s, received at 1.200 GB/s, average latency 3.08ms
Block 13 sent at 1.212GB/s, received at 1.201 GB/s, average latency 3.28ms
Block 14 sent at 1.193GB/s, received at 1.199 GB/s, average latency 3.48ms
Block 15 sent at 1.208GB/s, received at 1.202 GB/s, average latency 3.49ms
Block 16 sent at 1.195GB/s, received at 1.201 GB/s, average latency 3.52ms
Block 17 sent at 1.194GB/s, received at 1.201 GB/s, average latency 3.08ms
Block 18 sent at 1.206GB/s, received at 1.200 GB/s, average latency 3.06ms
Block 19 sent at 1.194GB/s, received at 1.201 GB/s, average latency 3.03ms
The key observations are:
EC2's throttling really works. It's easy to see the GB/s dropping to exactly 1.2 after about a second. Well done, Amazon.
The latency is, well non-negligible. I was honestly expecting better. And experimenting with different --n, on both the forward side and the latency-measuring one did not help much.
There are bad apples on EC2! For this particular test, the first two machines, A and B, I spinned up were capped at 0.6GB/s. I added a third one, C, and discovered that A <=> C can do 1.2GB/s, while A <=> B and B <=> C can only do 0.6GB/s. This is something to be aware of.
Hi Max,
While I am unsuccessfully trying to get both high throughput and low latency, here is a tiny test for the lowest bound on latency.
It's the lower bound because the code checks nothing :-) except the number of bytes sent/received, and the timestamps of begin/end of send/receive.
I have run this code on
m4.16xlarge
EC2 instances. It's very simple. On one machine you issue:and on the other one:
(The value of
--n
here is 100M, so that there are approximately 12 lines printed per second. The--k
parameter is just to keep it running effectively forever.)The result looks like this:
The key observations are:
--n
, on both theforward
side and the latency-measuring one did not help much.