cloud-bulldozer / benchmark-operator

The Chuck Norris of cloud benchmarks
Apache License 2.0
282 stars 127 forks source link

The client pod was not created in FIO workload #789

Closed OdedViner closed 2 years ago

OdedViner commented 2 years ago

Describe the bug The client pod was not created in FIO workload

To Reproduce Steps to reproduce the behavior: 1.Clone project https://github.com/cloud-bulldozer/benchmark-operator 2.Run “make deploy” cmd 3.Verify controller pod move to running state 4.Create this CR with relevant params https://github.com/Oded1990/odf-scripts/blob/main/configurations/benchmark_fio.yaml [exist on OCS-CI too] 5.Verify X servers pods move to running state 6.Verify client pod move to Completed state [client pod was not created!!!]

SetUp: OCP4.11 ODF4.11 Vmware

Expected behavior The client pod was created

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

jtaleric commented 2 years ago

Can you please attach the benchmark-operator logs?

OdedViner commented 2 years ago

Can you please attach the benchmark-operator logs? logs_benchmark-controller-manager-844dcdff69-m8n58.txt

jishii-rh commented 2 years ago

I've observed the same issue. FIO workload is failing with any kind of parameters by using latest image. As far as I checked by deploying images from several commits, 12701a6 is affecting the process of checking podIP parameter. Following can be a workaround as of now.

# kubectl delete deployment -n benchmark-operator benchmark-controller-manager # git checkout 94005cd # make image-build image-push deploy IMG=quay.io/\<username>/benchmark-operator:testing

jtaleric commented 2 years ago

It is unclear to me how num_pairs impacts this.

jtaleric commented 2 years ago

Debugging things a bit, I see the following --

^[[0;32m            "status": {^[[0m^M
^[[0;32m                "phase": "Pending",^[[0m^M
^[[0;32m                "qosClass": "BestEffort"^[[0m^M
^[[0;32m            }^[[0m^M

This is during the TASK [fio_distributed : Capture pod list] **************************************

We go into a failed state because -- TASK [fio_distributed : Create IP list and nodes] ******************************

Which does the following

    set_fact:
      pod_details: "{{ pod_details|default({}) | combine({item.status.podIP: item.spec.nodeName}) }}"
    with_items: "{{ server_pods.resources }}"

Which, from debug output, we can see there is no IP address annotated.

I am curious if this is a recent problem, and if we switch to SDN if that "fixes" the issue?