cloud-bulldozer / e2e-benchmarking

Performance Tests for end Platforms
Apache License 2.0
41 stars 74 forks source link

Network perf tests run on the Nutanix Cloud errors out after waiting for some time; Need to also address the literal failure as failed. #449

Open SachinNinganure opened 2 years ago

SachinNinganure commented 2 years ago

1>a network perf test for Nutanix cloud and ended up with below error;

08-01 23:27:55.713 [pod/uperf-client-10.131.0.33-23c6c6c9-jvqkk/benchmark] 2022-08-01T17:57:45Z - CRITICAL - MainProcess - process: After 3 attempts, unable to run command: ['uperf', '-v', '-a', '-R', '-i', '1', '-m', '/tmp/uperf-test/uperf-rr-udp-16384-16384-1', '-P', '30300']

08-01 23:27:55.713 [pod/uperf-client-10.131.0.33-23c6c6c9-jvqkk/benchmark] 2022-08-01T17:57:45Z - CRITICAL - MainProcess - uperf: Uperf failed to run! Got results: ProcessSample(expected_rc=0, success=False, attempts=3, timeout=None, failed=[ProcessRun(rc=1, stdout='Error getting SSL CTX:1\nAllocating shared memory of size 156624 bytes\nCompleted handshake phase 1\nStarting handshake phase 2\nHandshake phase 2 with 10.131.0.33\n Done preprocessing accepts\n Sent handshake header\n Sending workorder\n

The network perf test is claiming to be success while it has actually failed. same above test outputs success while it has failed.

jtaleric commented 2 years ago

This link seems to be a internal only link.

The output leads me to think there is connectivity issues?

jtaleric commented 2 years ago
08-01 17:57:55.710  uperf-client-10.131.0.33-23c6c6c9-jvqkk   0/1     Error               0          29m
08-01 17:57:55.710  uperf-client-10.131.0.33-23c6c6c9-r9phc   0/1     ContainerCreating   0          2s

From the internal only link you provided.... can you determine why the clients went into error state for the 4p test?

jtaleric commented 2 years ago
08-01 17:57:55.714  [pod/uperf-server-3-23c6c6c9-k8jz2/benchmark] Error creating ports: Address already in use

Seems like there might be a race condition?

SachinNinganure commented 2 years ago

I will rerun and provide more info; taking the link from here.

rsevilla87 commented 2 years ago

Im not sure if this error is a consequence of https://github.com/cloud-bulldozer/e2e-benchmarking/pull/444 or otherwise the error has been present for a while and https://github.com/cloud-bulldozer/benchmark-operator/pull/782 raised it up. Im still trying to determine the cause of this issue, in my case happened in the first sample of the scenario stream-udp-16384-16384-1

BTW @SachinNinganure , what do you mean with that the benchmark claiming success when this error happen, I did run a pod2pod benchmark, that failed in a similar way, here the benchmark was marked as Failed and exit code of the script was 1.

ubuntu@ip-10-0-24-166:~/e2e-benchmarking/workloads/network-perf$ oc get benchmark
NAME              TYPE    STATE    METADATA STATE   SYSTEM METRICS   UUID                                   AGE
uperf-pod2pod-4   uperf   Failed   not collected    Not collected    00f201d7-fea6-4fa6-8d40-f205e2728780   29m

Can you verify that, please?

rsevilla87 commented 2 years ago

There's a similar issue created some time ago https://github.com/cloud-bulldozer/e2e-benchmarking/issues/247

SachinNinganure commented 2 years ago

@rsevilla87 ; In few tests I ran, even though the test got failed the pipeline did not mark the failure. from your comments I am slightly unsure if that happens every-time. I will trigger some runs to get the failure and see if pipeline reports failure.