cloud-bulldozer / benchmark-operator

The Chuck Norris of cloud benchmarks
Apache License 2.0
282 stars 127 forks source link

Starting benchmark fails to retrieve status #810

Closed ppeereb1 closed 1 year ago

ppeereb1 commented 1 year ago

Describe the bug I'm trying the benchmark operator in Openshift 4.11. When creating the benchmark object it fails to retrieve Completed in the status of the benchmark object:

apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
  name: example
  namespace: benchmark-test
spec:
  system_metrics:
    collection: false
    image: 'quay.io/cloud-bulldozer/kube-burner:latest'
    index_name: system-metrics
    metrics_profile: node-metrics.yml
    prom_url: 'https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091'
    step: 30s
  metadata:
    stockpileTags:
      - common
      - k8s
      - openshift
    force: false
    ssl: false
    targeted: true
    privileged: false
    serviceaccount: default
    image: 'quay.io/cloud-bulldozer/backpack:latest'
    collection: false

oc logs benchmark-controller-manager-6f8cfdbf56-ms5jd -n openshift-operators | tail -n 100

[Setting the uuid for the benchmark] **************************************\r\n\u001b[1;30mtask path: /opt/ansible/playbooks/benchmark.yml:39\u001b[0m\n\u001b[0;36mskipping: [localhost] => {\u001b[0m\r\n\u001b[0;36m    \"changed\": false,\u001b[0m\r\n\u001b[0;36m    \"skip_reason\": \"Conditional result was False\"\u001b[0m\r\n\u001b[0;36m}\u001b[0m\n\r\nTASK [set_fact] ****************************************************************\r\n\u001b[1;30mtask path: /opt/ansible/playbooks/benchmark.yml:58\u001b[0m\n\u001b[0;36mskipping: [localhost] => {\u001b[0m\r\n\u001b[0;36m    \"changed\": false,\u001b[0m\r\n\u001b[0;36m    \"skip_reason\": \"Conditional result was False\"\u001b[0m\r\n\u001b[0;36m}\u001b[0m\n\r\nTASK [include_role : backpack] *************************************************\r\n\u001b[1;30mtask path: /opt/ansible/playbooks/benchmark.yml:74\u001b[0m\n\u001b[0;31mfatal: [localhost]: FAILED! => {\u001b[0m\r\n\u001b[0;31m    \"msg\": \"The conditional check 'benchmark_state is defined and benchmark_state.resources[0].status is defined and not benchmark_state.resources[0].status.complete|bool and (benchmark_state.resources[0].status.state is not defined or benchmark_state.resources[0].status.state != \\\"Error\\\")' failed. The error was: error while evaluating conditional (benchmark_state is defined and benchmark_state.resources[0].status is defined and not benchmark_state.resources[0].status.complete|bool and (benchmark_state.resources[0].status.state is not defined or benchmark_state.resources[0].status.state != \\\"Error\\\")): 'dict object' has no attribute 'complete'\\n\\nThe error appears to be in '/opt/ansible/playbooks/benchmark.yml': line 74, column 9, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:\\n\\n\\n      - include_role:\\n        ^ here\\n\"\u001b[0m\r\n\u001b[0;31m}\u001b[0m\n\r\nTASK [include_role : benchmark_state] ******************************************\r\n\u001b[1;30mtask path: /opt/ansible/playbooks/benchmark.yml:68\u001b[0m\n\u001b[0;31mfatal: [localhost]: FAILED! => {\u001b[0m\r\n\u001b[0;31m    \"msg\": \"The conditional check 'benchmark_state is defined and benchmark_state.resources[0].status is defined and not benchmark_state.resources[0].status.complete|bool and (benchmark_state.resources[0].status.state is not defined or benchmark_state.resources[0].status.state != \\\"Error\\\")' failed. The error was: error while evaluating conditional (benchmark_state is defined and benchmark_state.resources[0].status is defined and not benchmark_state.resources[0].status.complete|bool and (benchmark_state.resources[0].status.state is not defined or benchmark_state.resources[0].status.state != \\\"Error\\\")): 'dict object' has no attribute 'complete'\\n\\nThe error appears to be in '/opt/ansible/playbooks/benchmark.yml': line 68, column 9, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:\\n\\n    rescue:\\n      - include_role:\\n        ^ here\\n\"\u001b[0m\r\n\u001b[0;31m}\u001b[0m\n\r\nPLAY RECAP *********************************************************************\r\n\u001b[0;31mlocalhost\u001b[0m                  : \u001b[0;32mok=2   \u001b[0m changed=0    unreachable=0    \u001b[0;31mfailed=1   \u001b[0m \u001b[0;36mskipped=5   \u001b[0m \u001b[0;32mrescued=1   \u001b[0m ignored=0   \r\n\n","job":"7955079406183515637","name":"example","namespace":"benchmark-test","error":"exit status 2","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\ngithub.com/operator-framework/operator-sdk/internal/ansible/runner.(*runner).Run.func1\n\t/workspace/internal/ansible/runner/runner.go:271"}
{"level":"error","ts":1681387835.3260143,"logger":"controller-runtime.manager.controller.benchmark-controller","msg":"Reconciler error","name":"example","namespace":"benchmark-test","error":"event runner on failed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:302\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99"}

Completed is not in the status:

apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
  creationTimestamp: "2023-04-13T12:09:29Z"
  generation: 1
  name: example
  namespace: benchmark-test
  resourceVersion: "461663546"
  uid: 46f8324c-990d-46ea-a152-9f2a9e0feaa3
spec:
  metadata:
    collection: false
    force: false
    image: quay.io/cloud-bulldozer/backpack:latest
    privileged: false
    serviceaccount: default
    ssl: false
    stockpileSkipTags: []
    stockpileTags:
    - common
    - k8s
    - openshift
    targeted: true
  system_metrics:
    collection: false
    image: quay.io/cloud-bulldozer/kube-burner:latest
    index_name: system-metrics
    metrics_profile: node-metrics.yml
    prom_url: https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091
    step: 30s
status:
  message: None
  system_metrics: Not collected

I've also tried to set collection: true on system_metrics but that had the same effect. How can i fix this?

stale[bot] commented 1 year ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.