Mellanox / sockperf

Network Benchmarking Utility
Other
594 stars 119 forks source link

Floating point exception with --histogram #164

Open O-ring opened 2 years ago

O-ring commented 2 years ago

Hello everyone,

I am experiencing a sockperf crash when I run it with the --histogram option.

Unfortunately it's not deterministic: sometimes it happens sometimes it doesn't.

sockperf is compiled from source (current git version january 18, 2022) on Slackware 14.2 64bit. The kernel version is 5.16.2 (vanilla).

This is the script to reproduce the problem:

!/bin/sh

{

/usr/bin/sockperf pp --msg-size 64 -i 10.24.15.2 --time 3 --increase_output_precision --full-rtt --tcp --histogram 500:0:200000

} >> /tmp/LATENZA/crash 2>&1

and this is the complete output of the script:

sockperf: == version #3.7-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 10.24.15.2 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=3.000 sec; Warm up time=400 msec; SentMessages=99; ReceivedMessages=98 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=2.528 sec; SentMessages=83; ReceivedMessages=83 sockperf: ====> avg-rtt=30452.677 (std-dev=391.370, mean-ad=131.624, median-ad=13.812, siqr=9.198, cv=0.013, std-error=42.958, 99.0% ci=[30342.023, 30563.331]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Round trip is 30452.677 usec sockperf: Total 83 observations; each percentile contains 0.83 observations sockperf: ---> observation = 33157.322 sockperf: ---> percentile 99.999 = 33157.322 sockperf: ---> percentile 99.990 = 33157.322 sockperf: ---> percentile 99.900 = 33157.322 sockperf: ---> percentile 99.000 = 32552.695 sockperf: ---> percentile 90.000 = 30418.552 sockperf: ---> percentile 75.000 = 30393.695 sockperf: ---> percentile 50.000 = 30384.854 sockperf: ---> percentile 25.000 = 30375.297 sockperf: ---> observation = 30325.674 sockperf: [Histogram] Display scaled to fit on screen (Key: '#' = up to 0 samples) sockperf: bins misura_latenza.sh: line 10: 8762 Floating point exception/usr/bin/sockperf pp --msg-size 64 -i 10.24.15.2 --time 3 --increase_output_precision --full-rtt --tcp --histogram 500:0:200000

linux kernel is printing this message:

traps: sockperf[8762] trap divide error ip:4af0db sp:7ffd3d103960 error:0 in sockperf[400000+63a000]

igor-ivanov commented 2 years ago

thank you @O-ring for issue details. It will be handled.

O-ring commented 2 years ago

Hello Igor,

thanks for the response: I hope it has all the information you need.

Cheers, Marco