C5T / Current

C++ framework for realtime machine learning.
https://medium.com/dima-korolev/current-for-realtime-machine-learning-4f04aa8ab81a
97 stars 29 forks source link

Trivial latency measurements #870

Closed dkorolev closed 5 years ago

dkorolev commented 5 years ago

Hi Max,

While I am unsuccessfully trying to get both high throughput and low latency, here is a tiny test for the lowest bound on latency.

It's the lower bound because the code checks nothing :-) except the number of bytes sent/received, and the timestamps of begin/end of send/receive.


I have run this code on m4.16xlarge EC2 instances. It's very simple. On one machine you issue:

while true ; do ./.current/forward --sendto_host $OTHER_IP ; done

and on the other one:

./.current/latency_trivial --sendto_host $OTHER_IP --n 100000000 --k 10000

(The value of --n here is 100M, so that there are approximately 12 lines printed per second. The --k parameter is just to keep it running effectively forever.)

The result looks like this:

Block     1 sent at 1.652GB/s, received at 1.666 GB/s, average latency 3.94ms
Block     2 sent at 1.631GB/s, received at 1.625 GB/s, average latency 3.80ms
Block     3 sent at 1.594GB/s, received at 1.607 GB/s, average latency 3.65ms
Block     4 sent at 1.632GB/s, received at 1.628 GB/s, average latency 3.48ms
Block     5 sent at 1.573GB/s, received at 1.555 GB/s, average latency 3.92ms
Block     6 sent at 1.601GB/s, received at 1.622 GB/s, average latency 3.87ms
Block     7 sent at 1.586GB/s, received at 1.608 GB/s, average latency 3.03ms
Block     8 sent at 1.658GB/s, received at 1.609 GB/s, average latency 3.51ms
Block     9 sent at 1.412GB/s, received at 1.443 GB/s, average latency 3.66ms
Block    10 sent at 1.210GB/s, received at 1.201 GB/s, average latency 3.19ms
Block    11 sent at 1.198GB/s, received at 1.201 GB/s, average latency 3.37ms
Block    12 sent at 1.195GB/s, received at 1.200 GB/s, average latency 3.08ms
Block    13 sent at 1.212GB/s, received at 1.201 GB/s, average latency 3.28ms
Block    14 sent at 1.193GB/s, received at 1.199 GB/s, average latency 3.48ms
Block    15 sent at 1.208GB/s, received at 1.202 GB/s, average latency 3.49ms
Block    16 sent at 1.195GB/s, received at 1.201 GB/s, average latency 3.52ms
Block    17 sent at 1.194GB/s, received at 1.201 GB/s, average latency 3.08ms
Block    18 sent at 1.206GB/s, received at 1.200 GB/s, average latency 3.06ms
Block    19 sent at 1.194GB/s, received at 1.201 GB/s, average latency 3.03ms

The key observations are:

  1. EC2's throttling really works. It's easy to see the GB/s dropping to exactly 1.2 after about a second. Well done, Amazon.
  2. The latency is, well non-negligible. I was honestly expecting better. And experimenting with different --n, on both the forward side and the latency-measuring one did not help much.
  3. There are bad apples on EC2! For this particular test, the first two machines, A and B, I spinned up were capped at 0.6GB/s. I added a third one, C, and discovered that A <=> C can do 1.2GB/s, while A <=> B and B <=> C can only do 0.6GB/s. This is something to be aware of.