cryostatio / cryostat

Secure JDK Flight Recorder management for containerized JVMs
https://cryostat.io
Other
8 stars 8 forks source link

[Bug] Duplicate discovery node for pod in container discovery #412

Closed tthvo closed 2 months ago

tthvo commented 2 months ago

Current Behavior

image

With PodmanDiscovery, containers under the same pod are represented under different discovery node (i.e. pod node).

Expected Behavior

Containers under the same pod should have a common (pod) discovery node.

Steps To Reproduce

  1. Run smoketests bash smoketest.bash.

  2. Launch a pod:

    podman pod create --replace --name cryostat-pod
  3. Launch containers into that pod:

    podman run \
          --name jmxquarkus \
          --pod cryostat-pod \
          --label io.cryostat.discovery="true" \
          --label io.cryostat.jmxPort="51423" \
          --env QUARKUS_HTTP_PORT=10012 \
          --rm -d quay.io/roberttoyonaga/jmx:jmxquarkus@sha256:b067f29faa91312d20d43c55d194a2e076de7d0d094da3d43ee7d2b2b5a6f100
    
    podman run \
          --name vertx-fib-demo-0 \
          --env HTTP_PORT=8079 \
          --env JMX_PORT=9089 \
          --env START_DELAY=60 \
          --pod cryostat-pod \
          --label io.cryostat.discovery="true" \
          --label io.cryostat.jmxHost="vertx-fib-demo-0" \
          --label io.cryostat.jmxPort="9089" \
          --rm -d quay.io/andrewazores/vertx-fib-demo:0.13.1

Environment

- OS: Fedora
- Environment: Local smoketest with docker-compose and podman API

Anything else?

Related bugs:

tthvo commented 2 months ago

Seems like the bug above is caused by the pod discovery node being freshly created instead of taking the persisted version:

https://github.com/cryostatio/cryostat3/blob/d1008a24b6ac244222dc5fc5a937cad3a6445743/src/main/java/io/cryostat/discovery/ContainerDiscovery.java#L362

Not sure on the deletion issues tho...

tthvo commented 2 months ago

Not sure on the deletion issues tho...

Looks like if the container does not specify label io.cryostat.jmxHost, cryostat will query podman API for its hostname. But the container already shut down thus returning null on query response.