HewlettPackard / netperf

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.
MIT License
859 stars 187 forks source link

Behavior on version mismatch between netperf and netserver #64

Open thatsdone opened 2 years ago

thatsdone commented 2 years ago

I got the following error message when run netperf in my lab.

$ netperf  -H 172.20.110.227
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.20.110.227 () port 0 AF_INET
netperf: send_omni: connect_data_socket failed: Connection refused

At first, I checked netserver side iptables configuration etc., and after some more analysis, I noticed that there was a version mismatch between netperf (2.5.0) and netserver (2.7.0) while I was seeing strace log of netserver side.

$ ss -antp  | grep 12865
LISTEN      0        128                     *:12865                  *:*        users:(("netserver",pid=154671,fd=3))

$ sudo strace -p 154671 -f
strace: Process 154671 attached
select(4, [3], [], [], NULL)            = 1 (in [3])
accept(3, {sa_family=AF_INET6, sin6_port=htons(37730), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:172.20.105.106", &sin6_addr), sin6_scope_id=0}, [128->28]) = 6

(snip)

I noticed that there could be a version mismatch when I saw the trace line below.

[pid 158337] write(3, "unknown test number 0\n", 22) = 22
[pid 158337] sendto(6, "\0\0\0b\0\0\3\346\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0"..., 256, 0, NULL, 0) = 256
[pid 158337] recvfrom(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256, 0, NULL, NULL) = 144
[pid 158337] recvfrom(6, 0x55deaa20e250, 112, 0, NULL, NULL) = -1 ECONNRESET (Connection reset by peer)
[pid 158337] write(3, "recv_request: error on recv  err"..., 39) = 39
[pid 158337] exit_group(1)              = ?
[pid 158337] +++ exited with 1 +++
^Cstrace: Process 154671 detached

I do understand that we should use the same version for both netserver and netperf, but still this error message is hard to find out what is going on().

Here, my point is that this is a bit tricky, and I think it's better to output more information on the client (netperf) side.

What do you think about?