[RFE] Generic pod workload

rsevilla87 commented 2 years ago

Is your feature request related to a problem? Please describe.

There're several workloads that are using a very similar code base, with the only difference of the objects they deploy. i.e The vegeta workload consists of the following steps:

Create configuration resources
Create job
Set benchmark state to Running
Set the benchmark as complete: true when finishes

Similar steps are repeated in other workloads such as:

scale
log_generator
oslat
image_pull
and more

Describe the solution you'd like We can create a generic-workload (name TBD yet) role, able to perform these generic tasks, that way the workloads using this role will only have one task. i.e.

- include_role:
    name: generic_workload
  vars:
    resources: [template1.yml, template2.yml]

generic_workload tasks.yml

---
- block:

  - name: Creating workload resources
    k8s:
      state: present
      definition: "{{ lookup('template', item) }}"
   loop: "{{ resources }}"

  - include_role:
      name: benchmark_state
      tasks_from: set_state
    vars:
      state: Running

  when: benchmark_state.resources[0].status.state == "Building" 

- include_role:
    name: benchmark_state
    tasks_from: completed.yml
  when: benchmark_state.resources[0].status.state == "Running"```

Thanks to this new role we can reduce benchmark-operator code base and ease manageability. This would be a first approximation, we'll have to decide if we want to introduce a generic pod synchronization logic to the workloads deployed by this role. (At the moment that wouldn't be required though)

bengland2 commented 2 years ago

@rsevilla87 here are some incremental steps done in that direction:

I am working with @learnitall (he did most of the work) on a related effort to have all snafu-based wrapper images derive from a common base image.
I try to extract common pieces of code like redis synchronization (new snafu/utils/sync_pods_with_redis.py) or cache dropping (snafu/utils/request_cache_drop.py) into separate python modules so that all the benchmarks can use them.
@acalhounRH created modules that interfaced to elasticsearch so that each benchmark using run_snafu.py didn't have to do it.

But at present we have to explicitly provide support in each benchmark for cache dropping, or redis pod synchronization, etc. There is a lot of replicated code in each of these benchmarks. Would be great if there was some way to make that common behavior easier to achieve.

As an aside, The state machine for running a benchmark is a real PITA to explain to people (it requires unlearning threaded programming and going back to a polling model, doing this in ansible YAML (I know that it was done to avoid blocking benchmark-operator on a single benchmark). Would your proposal hide this state machine from people who want to integrate their favorite benchmark with benchmark-operator? One impact is that it is harder to read the benchmark-operator-manager log - which repeats the same steps over and over again, making the log huge. Would it be possible to build a benchmark-operator with an ansible-runner pod and log per benchmark, rather than 1 thread for all benchmarks?

An example would be useful using a simple existing benchmark, so we can compare the implementation before and after.

stale[bot] commented 2 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

cloud-bulldozer / benchmark-operator

[RFE] Generic pod workload #672