Mellanox / sockperf

Network Benchmarking Utility
Other
597 stars 119 forks source link

sockperf tcp mode: 100% cpu #180

Open O-ring opened 2 years ago

O-ring commented 2 years ago

Hello everyone,

I've seen that sockperf started with the following command:

sockperf sr --tcp --daemonize

consumes 100% of the cpu, after logging this message:

ADDR = 0.0.0.0:11111 # TCP sockperf: ERROR: Message received was larger than expected, message ignored. (errno=0 Success)

Is that expected behavior?

This is the output from top:

top - 15:33:28 up 176 days, 3:41, 1 user, load average: 1.12, 1.10, 1.09 Tasks: 147 total, 3 running, 144 sleeping, 0 stopped, 0 zombie %Cpu(s): 25.6 us, 1.3 sy, 0.0 ni, 73.0 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 3740.4 total, 2005.3 free, 343.4 used, 1391.7 buff/cache MiB Swap: 6366.3 total, 6365.8 free, 0.5 used. 3335.0 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30141 root 20 0 31940 11024 2844 R 100.0 0.3 826:40.60 sockperf

After that happens, I have no alternative but to kill the process and restart it.

Any feedback is welcome. Marco

igor-ivanov commented 2 years ago

Hello @O-ring, thank you for sharing the issue. Could you provide version number or commit of sockperf you used. In addition, please send your scenario that produces error from your issue description to simplify reproduction.

O-ring commented 2 years ago

Hello Igor,

the version installed is the one downloaded on May 20 from the repository (it also happens with earlier versions).

As for the scenario on my end it is a Slackware Linux 14.2 system with kernel 5.15. It is a multihomed system (there are four network cards).

Unfortunately, I cannot tell you anything about the remote system that triggers the problem. It is not under my control.

The following processes run on the system:

sockperf sr --daemonize sockperf sr --tcp --daemonize

the one occupying 100% of the cpu is the process launched with the parameter "--tcp".

Usually, I wait a day and find the process at 100%. For now I have workarounded the problem by blocking the ip that sparks the event with the firewall.

I am available to perform tests.

Marco

O-ring commented 2 years ago

Hello Igor,

Attached you can find the capture I made with tcpdump: 10.24.15.6 is the linux system where sockperf is running.

This is what is printed on the console:

Sat May 21 22:52:23 UTC 2022: ADDR = 0.0.0.0:11111 # TCP sockperf: ERROR: Message received was larger than expected, message ignored. (errno=0 Success) Sat May 21 22:52:23 UTC 2022: ADDR = 0.0.0.0:11111 # TCP sockperf: ERROR: Message received was larger than expected, message ignored. (errno=0 Success) Sat May 21 22:52:23 UTC 2022: ADDR = 0.0.0.0:11111 # TCP sockperf: ERROR: Message received was larger than expected, message ignored. (errno=0 Success)

Marco

11111.pcap.gz

igor-ivanov commented 2 years ago

Thank you, @O-ring. It will be investigated.

Aetrius commented 7 months ago

bump... any results?

O-ring commented 7 months ago

I'm able to reproduce the problem with git-current.

Marco

Aetrius commented 7 months ago

@O-ring

I just made this to run in kubernetes or docker. Try it out?

https://github.com/Aetrius/msockperf