cloud-bulldozer / metadata-collector

Containerization of the Stockpile project (https://github.com/cloud-bulldozer/stockpile)
Apache License 2.0
2 stars 6 forks source link

backpack and OCS #6

Open bengland2 opened 4 years ago

bengland2 commented 4 years ago

For some OCS tests, I want to run the workload pods on separate physical machines from the OCS (OpenShift Container Storage) cluster - I think this is a problem that other folks might have too, with more than OCS. At present, backpack will be blissfully unaware that there are other nodes involved in the test and will not collect metadata about them. It would be great if there was a way to hand backpack a list of labels that mean "collect metadata about any of the nodes that have one of these labels". For example, OCS nodes are all labelled. Doing it this way would make it applicable to much more than just OCS.

dry923 commented 4 years ago

@bengland2 I have created PR https://github.com/cloud-bulldozer/ripsaw/pull/335 which would allow you to run the metadata daemonset on only specifically labeled nodes. You would have to run it in addition to the normal fio/etc cr. Something as simple as this would work for nodes labeled foo=bar

apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
  name: backpack
  namespace: my-ripsaw
spec:
  elasticsearch:
    server: es_server
    port: 9200
  metadata:
    collection: true
    targeted: false
    label_name: foo
    label_value: bar
  workload:
    name: metadata
bengland2 commented 4 years ago

looks good to me, just have to try it out. 1 label is probably enough.

dry923 commented 4 years ago

If needed i could probably make it loop on a list but I haven't tried that with it yet

dry923 commented 4 years ago

@bengland2 I updated the PR to take a list of labels. Let me know your thoughts.

The cr would now look like:

apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
  name: backpack
  namespace: my-ripsaw
spec:
  elasticsearch:
    server: "marquez.perf.lab.eng.rdu2.redhat.com"
    port: 9200
  metadata:
    collection: true
    targeted: false
    label:
      - [ 'my', 'bar' ]
      - [ 'foo', 'bar' ]
  workload:
    name: metadata

The node(s) would have to match all the labels given as its an implied AND functionality.

bengland2 commented 4 years ago

my original post wasn't clear, but I was proposing an OR not an AND. For a workload like CNV+OCS, there might be multiple sets of nodes to deal with (i.e. CNV compute nodes + OCS storage nodes).

bengland2 commented 4 years ago

haven't tested the PR with 2 labels yet but it certainly works with 1 label.

bengland2 commented 4 years ago

@dry923 with 2 labels it failed, backpack never runs, here's the benchmark operator log, pretty-printed using, around line 340 there is an error.

ocmr logs $(ocmr get pod | awk '/benchmark-operator/{print $1}') -c benchmark-operator \
   | ~/parse-kube-pod-log.py - > ~/benchmark-operator.log

and the parse-kube-pod-log.py is here, I just can't read a raw log from benchmark operator, sorry.

The CR that I used is here. Again, with just 1 label it works, and is usable for me, just not quite as general purpose. This is low-priority, but someday would be nice to have. I'm using quay.io/benchmark-operator/benchmark-operator:master as of 20m ago.

dry923 commented 4 years ago

@bengland2 Thanks for the heads up. I'll take a look at why its failing and let you know.

dry923 commented 4 years ago

@bengland2 So my initial thought was that it might be failing on the empty value with the label

 [ 'cluster.ocs.openshift.io/openshift-storage', '']

However, I did a similar test and it worked fine. I also re-checked it with the fio cr we use for ci but adding in the labels and it also succeeded. The logs we're too helpful unfortunately, is there any other information you can give me? I.e. did backpack launch then error, did fio error, did the operator throw an error and it simply never started anything, etc?

Thanks!

jtaleric commented 3 years ago

Is this still an issue @bengland2 ?

bengland2 commented 3 years ago

the may 9th post is still an issue in my mind, I've just been preoccupied with other issues. The workaround is to just use one label for backpack that explicitly identifies which nodes you want to collect on, and that should be good enough for now.