Closed SachinNinganure closed 1 year ago
I created an ibm 4.13 cluster while testing some other things, is the 'uperf-client-*' seen below what you were missing? Wonder why it wouldn't work for 4.12 but would for 4.13
% oc get pods -n benchmark-operator
NAME READY STATUS RESTARTS AGE
backpack-37a49c24-44t64 1/1 Running 0 2m25s
backpack-37a49c24-9rwvp 1/1 Running 0 2m25s
backpack-37a49c24-c2jtw 1/1 Running 0 2m25s
backpack-37a49c24-np4h8 1/1 Running 0 2m25s
backpack-37a49c24-qvr4f 1/1 Running 0 2m25s
backpack-37a49c24-szbjd 1/1 Running 0 2m25s
benchmark-controller-manager-7d694c6b9c-94qxg 2/2 Running 0 3m41s
uperf-client-172.30.78.189-37a49c24-hbhnm 1/1 Running 0 32s
uperf-server-0-37a49c24-z4zhr 1/1 Running 0 63s
Looking on kibana from one of the runs you listed I do see a field that says "client_node:sn-npt-ibm412-fs4fv-worker-3-4lgbl", not sure if that's accurate or the data is found
Run for reference: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/network-perf/370/console
@SachinNinganure Hav you described the pod and got some events to check what happens to the pod?
BTW: Is that an ibm-cloud specific issue?
@paigerube14 the test you executed also failed client pod did not start there as-well . Looks like you tried to get the results when the test started and client pods looked good at the start but not after that.
11-17 02:47:57.645 ripsaw-cli:ripsaw.models.benchmark:ERROR :: Benchmark exception: The benchmark uperf-pod2svc-2 timed out 11-17 02:47:57.645 [1mWed Nov 16 21:17:57 UTC 2022 Benchmark failed, dumping workload more recent logs[0m 11-17 02:47:57.944 NAME READY STATUS RESTARTS AGE 11-17 02:47:57.944 uperf-server-0-37a49c24-psdrt 1/1 Running 0 122m 11-17 02:47:57.944 uperf-server-1-37a49c24-9sprz 1/1 Running 0 122m 11-17 02:47:58.245 [1mWed Nov 16 21:17:58 UTC 2022 Writing pod logs in /tmp/tmp.3mXNBirbJq/uperf-server-0-37a49c24-psdrt.log[0m
@qiliRedHat for now I am seeing them on 412 ibm-ocp , 410 and 411 looked good.
Investigating IBM cloud 4.12 today
NPT-ibm-412.odt added file with log info
From your log file, the backpack pod had problem that worth digging.
162m Warning Unhealthy pod/backpack-fa5f3f56-xlwlp Readiness probe failed: ls: cannot access '/tmp/indexed': No such file or directory
122m Normal Killing pod/backpack-fa5f3f56-xlwlp Stopping container backpack
In the failed Jenkins jobs log you provided, uperf-pod2svc-1 all completed while uperf-pod2svc-2 timed out. They are 2 same runs given Pairs defaults to 2, curious about what are the differences between the 2 runs.
added the log files from benchmark-controller-manager from 2 different tests one from aws where we see pod2svc test success and the other from ibm-412 which is failing to start client-pods
Try running with METADATA_COLLECTION=false
I the aws-412-npt-pass.log.gz, I saw the info about the uperf-client batch job
{"level":"info","ts":1668759356.2989795,"logger":"proxy","msg":"Cache miss: batch/v1, Kind=Job, benchmark-operator/uperf-client-172.30.131.128-194d3529"}
While in the ibm-412-npt-fail.log.gz, I can't find info about uperf-client
. So the pod is 'missing' because the batch job is not created.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue is still present, reopening.
METADATA_COLLECTION=false
I am running network tests(pod2svc) for "AWS - OVN - Customer VPC - Hybrid OS" cluster on ocp413. I hit the same error as the uperf_client pods fail to start/get created.
[sninganu@sninganu ~]$ oc logs -f benchmark-controller-manager-86d495644c-x922k -c manager|less|grep "Cache miss" {"level":"info","ts":1679406911.4110515,"logger":"proxy","msg":"Cache miss: apps/v1, Kind=DaemonSet, benchmark-operator/backpack-0cc4bd2b"} {"level":"info","ts":1679406912.2812552,"logger":"proxy","msg":"Cache miss: apps/v1, Kind=DaemonSet, benchmark-operator/backpack-0cc4bd2b"} {"level":"info","ts":1679406975.843491,"logger":"proxy","msg":"Cache miss: /v1, Kind=Service, benchmark-operator/uperf-service-0-0cc4bd2b"} {"level":"info","ts":1679406976.8609834,"logger":"proxy","msg":"Cache miss: batch/v1, Kind=Job, benchmark-operator/uperf-server-0-0cc4bd2b"} {"level":"info","ts":1679406982.1258848,"logger":"proxy","msg":"Cache miss: /v1, Kind=ConfigMap, benchmark-operator/uperf-test-0-0cc4bd2b"}
Test Link --> https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/network-perf/651/parameters/ the result is same when run manually aswell
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Pod2service Network perf (uperf)test for ibm-cloud on ocp412 fail as the uperf-client pods are not being created.
test -->https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/network-perf/369/console
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/network-perf/368/
@qiliRedHat @paigerube14 @mffiedler @rsevilla87