Mellanox / sockperf

Network Benchmarking Utility
Other
597 stars 119 forks source link

How to read sockperf output #172

Closed LuisRodriguezMSFT closed 2 years ago

LuisRodriguezMSFT commented 2 years ago

Hi, this is a basic question maybe but from the docs i don't see a clear answer.

I am measuring the latency between two VMs, according to my network infra i would expect to get around 2ms but percentiles are showing something different:

_sockperf ping-pong -i 10.12.0.4 --tcp -m 350 -t 200 -p 12345 --full-rtt sockperf: == version #3.8-0.git31ee322aa82a == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 10.12.0.4 PORT = 12345 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=199.999 sec; Warm up time=400 msec; SentMessages=105665; ReceivedMessages=105664 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=199.548 sec; SentMessages=105413; ReceivedMessages=105413 sockperf: ====> avg-rtt=1892.600 (std-dev=422.741, mean-ad=240.040, median-ad=243.961, siqr=169.591, cv=0.223, std-error=1.302, 99.0% ci=[1889.246, 1895.954]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Round trip is 1892.600 usec sockperf: Total 105413 observations; each percentile contains 1054.13 observations sockperf: ---> MAX observation = 11541.068 sockperf: ---> percentile 99.999 = 11533.498 sockperf: ---> percentile 99.990 = 8954.485 sockperf: ---> percentile 99.900 = 7703.087 sockperf: ---> percentile 99.000 = 3031.973 sockperf: ---> percentile 90.000 = 2252.356 sockperf: ---> percentile 75.000 = 2013.136 sockperf: ---> percentile 50.000 = 1821.435 sockperf: ---> percentile 25.000 = 1673.954 sockperf: ---> MIN observation = 1270.749_

As you can see the round trip is 1892.600 usec which is ok (1.892 ms if i am not mistaken) But from the percentiles (90.000, 99.000...) i see higher times, up to 11533.498 usec for instance.

I wanted to clarify how to read this data and know what those percentiles means. Is that there where frames that took 11533.498 usec to be replied? Is that something to worry about in terms of latency?

Thank you,

igor-ivanov commented 2 years ago

I wanted to clarify how to read this data and know what those percentiles means.

the 99 percentile, is defined as the value that 99 out of 100 samples fall below. Thus 99 of 100, observe a latency less than this value, and 1 in every 100 observe a latency equal to or greater. see function for percentiles output for more details https://github.com/Mellanox/sockperf/blob/sockperf_v2/src/client.cpp#L576

Is that there where frames that took 11533.498 usec to be replied?

yes

LuisRodriguezMSFT commented 2 years ago

Thanks Igor,

As avg-rtt is much lower (1892.600 usec / 1.892 msec) is that 99 percentile value something to worry about in terms of latency? If my network requirement is to have 2ms or less as average am i safe taking in count the avg-rtt only?

I am saying this becase i tested this even on VM's within the same host (minimal latency) and whereas the avg-rtt is good i still get high values on the 99 percentile.

igor-ivanov commented 2 years ago

It will be nice to reduce abnormal values. It might be done tuning servers and configuration, binding application to correct cores etc.

LuisRodriguezMSFT commented 2 years ago

Thanks Igor