cloud-bulldozer / metadata-collector

Containerization of the Stockpile project (https://github.com/cloud-bulldozer/stockpile)
Apache License 2.0
2 stars 6 forks source link

question about metadata collection #5

Closed bengland2 closed 4 years ago

bengland2 commented 4 years ago

I'm running fio with many server pods per host, and I noticed that all of them are running backpack, so this means that backpack is running several times per host. Is this necessary? Would it make more sense to have backpack run as a daemonset and just trigger it from ripsaw? Just curious because maybe this has been discussed and I just missed it. For example, here I have 7 hosts, why run backpack 40 times?

[root@e23-h05-740xd ~]# ocmr get pod
NAME                                     READY   STATUS     RESTARTS   AGE
benchmark-operator-dc7db7f8f-bpg95       3/3     Running    0          42m
fio-server-1-benchmark-47a24a1a-kwgjm    0/1     Init:0/1   0          69s
...
fio-server-39-benchmark-47a24a1a-pmqvv   0/1     Init:0/1   0          42s

Also, I'm seeing Init:Error state but don't know how to debug since there is no log showing what happened that I'm aware of.

...
fio-server-1-benchmark-47a24a1a-c7z2j    0/1     Init:0/1     0          47s
fio-server-1-benchmark-47a24a1a-kwgjm    0/1     Init:Error   0          5m9s
fio-server-1-benchmark-47a24a1a-zzmc7    0/1     Init:Error   0          3m1s
...

[root@e23-h05-740xd ~]# ocmr describe pod fio-server-1-benchmark-47a24a1a-kwgjm
...
Init Containers:
  backpack-47a24a1a:
    Container ID:  cri-o://62f06d39eb380188101be74d2ba5f658d058de2e47057198f741d5cd5c685348
    Image:         quay.io/cloud-bulldozer/backpack:latest
    Image ID:      quay.io/cloud-bulldozer/backpack@sha256:18296814234f0cf3e2ab1ddb4fd5a7c4455faaba3a9d7c7038ca7e5d6e9ba1d7
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
    Args:
      python3 stockpile-wrapper.py -s marquez.perf.lab.eng.rdu2.redhat.com -p 9200 -u 47a24a1a-f37a-5d6d-ab2a-0ece296122f4 -n $my_node_name -N $my_pod_name
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 24 Apr 2020 15:14:19 +0000
      Finished:     Fri, 24 Apr 2020 15:16:18 +0000
    Ready:          False
    Restart Count:  0
    Environment:
      my_node_name:   (v1:spec.nodeName)
      my_pod_name:   fio-server-1-benchmark-47a24a1a-kwgjm (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from benchmark-operator-token-58wft (ro)
...

CR:

apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
  name: fio-benchmark
  namespace: my-ripsaw
spec:
  metadata_collection: true
  elasticsearch:
    server: marquez.perf.lab.eng.rdu2.redhat.com
    port: 9200
  clustername: bene-alias-cloud14-apr-23
  test_user: bene
  workload:
    name: "fio_distributed"
    args:
      # if true, do large sequential write to preallocate volume before using
      prefill: true
      # number of times each test
      samples: 3
      # number of fio pods generating workload
      servers: 40
...
dry923 commented 4 years ago

@bengland2 https://github.com/cloud-bulldozer/ripsaw/pull/315 should assist with this.

dry923 commented 4 years ago

@bengland2 Ripsaw PR 315 was just merged. Give this a try again and see if it works better for you now.

bengland2 commented 4 years ago

I think this issue is sorted out now, I saw 26 backpack pods one time, once per OCS host as specified by the label in the ripsaw CR.

jtaleric commented 4 years ago

Closing based on your last comment @bengland2