[Bug] Kubernetes discovery - targets not being removed after pods tear down

cryostatio / cryostat

Secure JDK Flight Recorder management for containerized JVMs

https://cryostat.io

Other

11 stars 9 forks source link

[Bug] Kubernetes discovery - targets not being removed after pods tear down #634

Open grzesuav opened 2 weeks ago

grzesuav commented 2 weeks ago

Current Behavior

Currently in the topology/target selection I can still see old/nonexisting targets in the vew, even like 5 minutes after pods stopped

Expected Behavior

Non-running pods are removed

Steps To Reproduce

k get rs
❯ k get rs
NAME                           DESIRED   CURRENT   READY   AGE
registry-556c9d5446            2         2         2       17m
registry-6878b7c78b            0         0         0       70m
registry-f459568bf             0         0         0       9d

as you can see, replicaset f459568bf is quite old, and it does not currently any running pod

❯ k get pods
NAME                                 READY   STATUS    RESTARTS   AGE
registry-556c9d5446-bzm2m            2/2     Running   0          18m
registry-556c9d5446-xh2nq            2/2     Running   0          20m

Environment

- OS: AKSUbuntu
- Environment: AKS 1.31
- Version: Cryoostat 3.0

Anything else?

No response

andrewazores commented 2 weeks ago

@grzesuav are there any exceptions that appear in the Cryostat container logs at the time (or within some seconds after) you scale down or delete one of these deployments?

And could you paste the output from:

$ kubectl get -o yaml endpoints

grzesuav commented 2 weeks ago

❯ k get endpoints registry -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    endpoints.kubernetes.io/last-change-trigger-time: "2024-09-02T15:26:47Z
  name: registry
  namespace: registry
subsets:
- addresses:
  - ip: 10.184.uuu.xxx
    nodeName: aks-nodepool0609-redacted
    targetRef:
      kind: Pod
      name: registry-8b68c85b8-mjt7n
      namespace: registry
      uid: 852fcb5b-redacted
  - ip: 10.184.fff.rrr
    nodeName: aks-nodepool0609-redacted
    targetRef:
      kind: Pod
      name: registry-8b68c85b8-sqktd
      namespace: registry
      uid: 03b7ba07-redacted
  notReadyAddresses:
  - ip: 10.184.yyy.xxx
    nodeName: aks-nodepool0609-redacted
    targetRef:
      kind: Pod
      name: registry-8b68c85b8-2zj9n
      namespace: registry
      uid: 66499772-redacted
  ports:
  - name: http
    port: 9000
    protocol: TCP
  - name: jfr-jmx
    port: 9091
    protocol: TCP
  - name: http-prometheus
    port: 9090
    protocol: TCP

 k get pods -o wide
NAME                                 READY   STATUS    RESTARTS   AGE     IP              NODE                                   NOMINATED NODE   READINESS GATES
registry-8b68c85b8-2zj9n             2/2     Running   0          38s     10.184.   aks-nodepool0609-redacted   <none>           <none>
registry-8b68c85b8-mjt7n             2/2     Running   0          6m22s   10.184.   aks-nodepool0609-redacted   <none>           <none>
registry-8b68c85b8-sqktd             2/2     Running   0          6m43s   10.184.    aks-nodepool0609-redacted   <none>           <none>

grzesuav commented 2 weeks ago

I see some various errors in cryostat logs, will continue tomorrow to provide more details

grzesuav commented 1 week ago

So actually today I see the same targets as in https://github.com/cryostatio/cryostat/issues/634#issuecomment-2324993814 where the pods aren't ther for many hours. Explore-logs-2024-09-03 12_26_01.txt

attaching logs which apper now when I am trying to connect.

@andrewazores what is the name of the kubernetes discovery logger ? Maybe I can filter logs related to that to find something intertesting ?

andrewazores commented 1 week ago

https://github.com/cryostatio/cryostat/blob/9e2a375231b3dc9cd6ee67b0bc7af6d971fad62b/src/main/java/io/cryostat/discovery/KubeApiDiscovery.java#L81

I think the Logger's name should be io.cryostat.discovery.KubeApiDiscovery.