galasa-dev / projectmanagement

Project Management repo for Issues and ZenHub
7 stars 4 forks source link

Test pods are not being cleaned up #2005

Open eamansour opened 1 month ago

eamansour commented 1 month ago

Describe the bug

When running a kubectl get pods in the galasa-dev k8s namespace where prod1 lives, there are a lot of test pods that are in the Completed state but aren't being cleaned up by the resource monitor.

Restarting the resource monitor doesn't seem to help.

> kg pods
galasa-prod1-k8s-standard-engine-c11030           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11031           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11032           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11033           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11034           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11035           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11036           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11037           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11038           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11039           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11040           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11040-1         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11041           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11041-1         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11042           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11043           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11043-1         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11044           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11044-1         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11045           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11045-1         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11045-4         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11046           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11046-1         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11047           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11048           0/1     Completed   0          6d9h
galasa-prod1-k8s-standard-engine-c11049           0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11050           0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11051           0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11052           0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11052-5         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11052-6         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11052-7         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11053           0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11053-2         0/1     Completed   0          6d8h
galasa-prod1-k8s-standard-engine-c11054           0/1     Completed   0          6d8h
...
galasa-prod1-k8s-standard-engine-c11419           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11420           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11421           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11422           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11423           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11424           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11424-1         0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11425           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11425-1         0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11426           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11427           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11428           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11429           0/1     Completed   0          8h
galasa-prod1-k8s-standard-engine-c11470           0/1     Completed   0          18m
galasa-prod1-k8s-standard-engine-c11471           0/1     Completed   0          10m
galasa-prod1-k8s-standard-engine-c11472           0/1     Completed   0          9m30s
galasa-prod1-k8s-standard-engine-c11473           0/1     Completed   0          6m5s
galasa-prod1-k8s-standard-engine-c11474           0/1     Completed   0          4m56s
galasa-prod1-k8s-standard-engine-c11474-1         0/1     Completed   0          4m26s

Steps to reproduce

  1. Run kubectl get pods -n galasa-dev
  2. See lots of completed pods

Expected behavior

Finished tests should be cleaned up and the pods should not exist.

eamansour commented 1 month ago

Looked into the etcd pod for one of the above runs and there are some DSS properties still set for the run, which is preventing the pod from being deleted. Finished runs should have all of their properties removed from the DSS - possibly something going wrong in the test runner or resource monitor?

/ # etcdctl get --prefix dss.framework.run.C11032
dss.framework.run.C11032.allocate.timeout
2024-09-20T06:15:18.046583935Z
dss.framework.run.C11032.allocated
2024-09-20T06:00:18.046583935Z
dss.framework.run.C11032.controller
k8s-controller
dss.framework.run.C11032.finished
2024-09-20T06:00:45.317826407Z
dss.framework.run.C11032.group
e92948b1-5237-4a2a-bd0d-054b2d22dc76
dss.framework.run.C11032.local
false
dss.framework.run.C11032.queued
2024-09-20T06:00:10.495191201Z
dss.framework.run.C11032.rasrunid
cdb-88279a5fdef52dd33f48f5f350c33263
dss.framework.run.C11032.request.type
CLI
dss.framework.run.C11032.requestor
galasadelivery@ibm.com
dss.framework.run.C11032.result
EnvFail
dss.framework.run.C11032.started
2024-09-20T06:00:35.036521921Z
dss.framework.run.C11032.status
finished
dss.framework.run.C11032.stream
inttests
dss.framework.run.C11032.test
dev.galasa.inttests/dev.galasa.inttests.sdv.local.isolated.SDVLocalJava11UbuntuIsolated
dss.framework.run.C11032.testbundle
dev.galasa.inttests
dss.framework.run.C11032.testclass
dev.galasa.inttests.sdv.local.isolated.SDVLocalJava11UbuntuIsolated
dss.framework.run.C11032.trace
true
eamansour commented 1 month ago

In the meantime, I've cleared the properties hanging around from old runs from the DSS. Test pods are now being cleaned up properly and nothing is being left behind in the DSS. Will keep monitoring in case this happens again.