cryostatio / cryostat

Secure JDK Flight Recorder management for containerized JVMs
https://cryostat.io
Other
12 stars 10 forks source link

[Bug] Failed to generate the JVM identifier #696

Open grzesuav opened 2 days ago

grzesuav commented 2 days ago

Current Behavior

On certain pods I see message Failed to generate the JVM identifier -

image image

I guess it can be related with underlying error in logs 2024-10-24 08:40:47,211 WARNING [io.cry.cor.jmc.JMCAgentJMXHelper] (executor-thread-1537) Could not retrieve event probes: javax.management.InstanceNotFoundException: org.openjdk.jmc.jfr.agent:type=AgentController

when trying to capture recording I see also console issue -

image

however recording is created fine -

image

the error seems to be related to HTTP Request to /api/v3/targets/31/recordings failed, error id: e8c0c380-fdc9-4961-b7b1-fbb46b44b230-4: jakarta.transaction.RollbackException: ARJUNA016053: Could not commit transaction.

Attaching logs - cryostat-Explore-logs-2024-10-24 10_47_03.txt

Expected Behavior

No issue with JVM identifier generation, no issue when creating recording (or more meaningful one)

Steps To Reproduce

No response

Environment

- OS: Ubuntu
- Environment: Kubernetes Azure 1.30
- Version: Cryostat 3

Anything else?

No response

grzesuav commented 2 days ago

For the record, modules present in the affected target are

java.base@17.0.11
java.compiler@17.0.11
java.datatransfer@17.0.11
java.desktop@17.0.11
java.instrument@17.0.11
java.logging@17.0.11
java.management.rmi@17.0.11
java.management@17.0.11
java.naming@17.0.11
java.net.http@17.0.11
java.prefs@17.0.11
java.rmi@17.0.11
java.scripting@17.0.11
java.se@17.0.11
java.security.jgss@17.0.11
java.security.sasl@17.0.11
java.sql.rowset@17.0.11
java.sql@17.0.11
java.transaction.xa@17.0.11
java.xml.crypto@17.0.11
java.xml@17.0.11
jdk.attach@17.0.11
jdk.crypto.ec@17.0.11
jdk.dynalink@17.0.11
jdk.httpserver@17.0.11
jdk.internal.jvmstat@17.0.11
jdk.jcmd@17.0.11
jdk.jdwp.agent@17.0.11
jdk.jfr@17.0.11
jdk.jstatd@17.0.11
jdk.localedata@17.0.11
jdk.management.agent@17.0.11
jdk.management.jfr@17.0.11
jdk.management@17.0.11
jdk.security.auth@17.0.11
jdk.unsupported@17.0.11
jdk.zipfs@17.0.11
grzesuav commented 1 day ago

listed via

java --list-modules

I have another application with the same module set and it is working fine, no JMV identifier issue and no UI error when taking recording

grzesuav commented 1 day ago

I guess the error message refers to this JMX module - https://github.com/openjdk/jmc/blob/jmc9/application/org.openjdk.jmc.console.agent/src/main/java/org/openjdk/jmc/console/agent/AgentJmxHelper.java#L55 ?

andrewazores commented 1 day ago

2024-10-24 08:40:47,211 WARNING [io.cry.cor.jmc.JMCAgentJMXHelper] (executor-thread-1537) Could not retrieve event probes: javax.management.InstanceNotFoundException: org.openjdk.jmc.jfr.agent:type=AgentController

https://github.com/openjdk/jmc/blob/jmc9/application/org.openjdk.jmc.console.agent/src/main/java/org/openjdk/jmc/console/agent/AgentJmxHelper.java#L55

These are referring to the JMC Agent integration feature, which is not really related to the JVM ID problem you're experiencing. If you don't have the JMC Agent installed in your application then seeing messages like this is expected behaviour.


As for the JVM ID being missing, this could be a connection timing problem. In 3.0, Cryostat tries to connect to the target JVM immediately when it is discovered so that it can compute the JVM's hash ID. Sometimes this means it is actually connecting prematurely - if using Kubernetes Endpoints discovery for example, it's possible that Kubernetes creates the Endpoints object and informs Cryostat about it before the target JVM is actually ready to accept incoming JMX connections, so when Cryostat sees the target and tries to connect then it fails to do so and the ID gets left blank. Later on the target JVM becomes ready and Cryostat is able to connect to it when you try to start a recording or perform other actions, but the ID never gets updated.

https://github.com/cryostatio/cryostat/pull/689 should also address this kind of bug, if my guess at the behaviour above is accurate in this particular case.


HTTP Request to /api/v3/targets/31/recordings failed, error id: e8c0c380-fdc9-4961-b7b1-fbb46b44b230-4: jakarta.transaction.RollbackException: ARJUNA016053: Could not commit transaction

This looks more like a problem:

2024-10-24 10:45:40.616 2024-10-24 08:45:40,614 INFO  [io.qua.htt.access-log] (executor-thread-1553) 83.27.6.117 - - [24/Oct/2024:08:45:40 +0000] "POST /api/v3/targets/31/recordings HTTP/1.1" 500 72
2024-10-24 10:45:40.616     ... 24 more
2024-10-24 10:45:40.616     at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1295)
2024-10-24 10:45:40.616     at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
2024-10-24 10:45:40.616     at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:91)
2024-10-24 10:45:40.616     at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:360)
2024-10-24 10:45:40.616     at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:76)
2024-10-24 10:45:40.616     at org.hibernate.resource.transaction.backend.jta.internal.synchronization.RegisteredSynchronization.beforeCompletion(RegisteredSynchronization.java:37)
2024-10-24 10:45:40.616     at org.hibernate.resource.transaction.backend.jta.internal.synchronization.SynchronizationCallbackCoordinatorNonTrackingImpl.beforeCompletion(SynchronizationCallbackCoordinatorNonTrackingImpl.java:47)
2024-10-24 10:45:40.616     at org.hibernate.resource.transaction.backend.jta.internal.JtaTransactionCoordinatorImpl.beforeCompletion(JtaTransactionCoordinatorImpl.java:336)
2024-10-24 10:45:40.616     at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.beforeTransactionCompletion(JdbcCoordinatorImpl.java:439)
2024-10-24 10:45:40.616     at org.hibernate.internal.SessionImpl.beforeTransactionCompletion(SessionImpl.java:1966)
2024-10-24 10:45:40.616     at org.hibernate.internal.SessionImpl.flushBeforeTransactionCompletion(SessionImpl.java:2301)
2024-10-24 10:45:40.616     at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:485)
2024-10-24 10:45:40.616     at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1412)
2024-10-24 10:45:40.616     at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:127)
2024-10-24 10:45:40.616     at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:39)
2024-10-24 10:45:40.616     at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:364)
2024-10-24 10:45:40.616     at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:502)
2024-10-24 10:45:40.616     at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:635)
2024-10-24 10:45:40.616     at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:115)
2024-10-24 10:45:40.616     at org.hibernate.action.internal.EntityInsertAction.postInsert(EntityInsertAction.java:176)
2024-10-24 10:45:40.616     at org.hibernate.event.service.internal.EventListenerGroupImpl.fireLazyEventOnEachListener(EventListenerGroupImpl.java:116)
2024-10-24 10:45:40.616     at org.hibernate.event.internal.PostInsertEventListenerStandardImpl.onPostInsert(PostInsertEventListenerStandardImpl.java:31)
2024-10-24 10:45:40.616     at org.hibernate.jpa.event.internal.CallbackRegistryImpl.postCreate(CallbackRegistryImpl.java:81)
2024-10-24 10:45:40.616     at org.hibernate.jpa.event.internal.CallbackRegistryImpl.callback(CallbackRegistryImpl.java:123)
2024-10-24 10:45:40.616     at org.hibernate.jpa.event.internal.ListenerCallback.performCallback(ListenerCallback.java:55)
2024-10-24 10:45:40.616     at java.base/java.lang.reflect.Method.invoke(Method.java:568)
2024-10-24 10:45:40.616     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-10-24 10:45:40.616     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2024-10-24 10:45:40.616     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-10-24 10:45:40.616     at io.cryostat.recordings.ActiveRecording_Listener_ClientProxy.postPersist(Unknown Source)
2024-10-24 10:45:40.616     at io.cryostat.recordings.ActiveRecording$Listener.postPersist(ActiveRecording.java:172)
2024-10-24 10:45:40.616     at io.cryostat.recordings.ActiveRecording$Listener$ActiveRecordingEvent$Payload.of(ActiveRecording.java:228)
2024-10-24 10:45:40.616     at io.cryostat.recordings.ActiveRecording$Listener$ActiveRecordingEvent$Payload.<init>(ActiveRecording.java:222)
2024-10-24 10:45:40.616     at java.base/java.util.Objects.requireNonNull(Objects.java:209)
2024-10-24 10:45:40.616 Caused by: java.lang.NullPointerException
2024-10-24 10:45:40.616     at java.base/java.lang.Thread.run(Thread.java:840)
2024-10-24 10:45:40.616     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2024-10-24 10:45:40.616     at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
2024-10-24 10:45:40.616     at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
2024-10-24 10:45:40.616     at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
2024-10-24 10:45:40.616     at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
2024-10-24 10:45:40.616     at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
2024-10-24 10:45:40.616     at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
2024-10-24 10:45:40.616     at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
2024-10-24 10:45:40.616     at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
2024-10-24 10:45:40.616     at io.cryostat.recordings.Recordings$quarkusrestinvoker$createRecording_3a6d003fcdb6a24416e86a17a2a45440af6f87f7.invoke(Unknown Source)
2024-10-24 10:45:40.616     at io.cryostat.recordings.Recordings_Subclass.createRecording(Unknown Source)
2024-10-24 10:45:40.616     at io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
2024-10-24 10:45:40.616     at io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
2024-10-24 10:45:40.616     at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired_Bean.intercept(Unknown Source)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.intercept(TransactionalInterceptorRequired.java:32)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.intercept(TransactionalInterceptorBase.java:61)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.doIntercept(TransactionalInterceptorRequired.java:38)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:107)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:171)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.endTransaction(TransactionalInterceptorBase.java:406)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.NotifyingTransactionManager.commit(NotifyingTransactionManager.java:70)
2024-10-24 10:45:40.616     at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:128)
2024-10-24 10:45:40.616     at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1307)
2024-10-24 10:45:40.616 2024-10-24 08:45:40,614 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-1553) HTTP Request to /api/v3/targets/31/recordings failed, error id: e8c0c380-fdc9-4961-b7b1-fbb46b44b230-4: jakarta.transaction.RollbackException: ARJUNA016053: Could not commit transaction.
2024-10-24 10:45:40.616 2024-10-24T08:45:40.616464845Z stdout F 
2024-10-24 10:45:40.616     at java.base/java.lang.Thread.run(Thread.java:840)
2024-10-24 10:45:40.616     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2024-10-24 10:45:40.616     at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
2024-10-24 10:45:40.616     at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
2024-10-24 10:45:40.616     at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
2024-10-24 10:45:40.616     at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
2024-10-24 10:45:40.616     at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
2024-10-24 10:45:40.616     at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
2024-10-24 10:45:40.616     at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
2024-10-24 10:45:40.616     at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
2024-10-24 10:45:40.616     at io.cryostat.recordings.Recordings$quarkusrestinvoker$createRecording_3a6d003fcdb6a24416e86a17a2a45440af6f87f7.invoke(Unknown Source)
2024-10-24 10:45:40.616     at io.cryostat.recordings.Recordings_Subclass.createRecording(Unknown Source)
2024-10-24 10:45:40.616     at io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
2024-10-24 10:45:40.616     at io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
2024-10-24 10:45:40.616     at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired_Bean.intercept(Unknown Source)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.intercept(TransactionalInterceptorRequired.java:32)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.intercept(TransactionalInterceptorBase.java:61)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.doIntercept(TransactionalInterceptorRequired.java:38)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:107)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:171)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.endTransaction(TransactionalInterceptorBase.java:406)
2024-10-24 10:45:40.616     at io.quarkus.narayana.jta.runtime.NotifyingTransactionManager.commit(NotifyingTransactionManager.java:70)
2024-10-24 10:45:40.616     at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:128)
2024-10-24 10:45:40.616     at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1295)
2024-10-24 10:45:40.616     at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
2024-10-24 10:45:40.616     at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:91)
2024-10-24 10:45:40.616     at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:360)
2024-10-24 10:45:40.616     at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:76)
2024-10-24 10:45:40.616     at org.hibernate.resource.transaction.backend.jta.internal.synchronization.RegisteredSynchronization.beforeCompletion(RegisteredSynchronization.java:37)
2024-10-24 10:45:40.616     at org.hibernate.resource.transaction.backend.jta.internal.synchronization.SynchronizationCallbackCoordinatorNonTrackingImpl.beforeCompletion(SynchronizationCallbackCoordinatorNonTrackingImpl.java:47)
2024-10-24 10:45:40.614     at org.hibernate.resource.transaction.backend.jta.internal.JtaTransactionCoordinatorImpl.beforeCompletion(JtaTransactionCoordinatorImpl.java:336)
2024-10-24 10:45:40.614     at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.beforeTransactionCompletion(JdbcCoordinatorImpl.java:439)
2024-10-24 10:45:40.614     at org.hibernate.internal.SessionImpl.beforeTransactionCompletion(SessionImpl.java:1966)
2024-10-24 10:45:40.614     at org.hibernate.internal.SessionImpl.flushBeforeTransactionCompletion(SessionImpl.java:2301)
2024-10-24 10:45:40.614     at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:485)
2024-10-24 10:45:40.614     at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1412)
2024-10-24 10:45:40.614     at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:127)
2024-10-24 10:45:40.614     at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:39)
2024-10-24 10:45:40.614     at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:364)
2024-10-24 10:45:40.614     at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:502)
2024-10-24 10:45:40.614     at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:635)
2024-10-24 10:45:40.614     at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:115)
2024-10-24 10:45:40.614     at org.hibernate.action.internal.EntityInsertAction.postInsert(EntityInsertAction.java:176)
2024-10-24 10:45:40.614     at org.hibernate.event.service.internal.EventListenerGroupImpl.fireLazyEventOnEachListener(EventListenerGroupImpl.java:116)
2024-10-24 10:45:40.614     at org.hibernate.event.internal.PostInsertEventListenerStandardImpl.onPostInsert(PostInsertEventListenerStandardImpl.java:31)
2024-10-24 10:45:40.614     at org.hibernate.jpa.event.internal.CallbackRegistryImpl.postCreate(CallbackRegistryImpl.java:81)
2024-10-24 10:45:40.613     at org.hibernate.jpa.event.internal.CallbackRegistryImpl.callback(CallbackRegistryImpl.java:123)
2024-10-24 10:45:40.613     at org.hibernate.jpa.event.internal.ListenerCallback.performCallback(ListenerCallback.java:55)
2024-10-24 10:45:40.613     at java.base/java.lang.reflect.Method.invoke(Method.java:568)
2024-10-24 10:45:40.613     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-10-24 10:45:40.613     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2024-10-24 10:45:40.613     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-10-24 10:45:40.613     at io.cryostat.recordings.ActiveRecording_Listener_ClientProxy.postPersist(Unknown Source)
2024-10-24 10:45:40.613     at io.cryostat.recordings.ActiveRecording$Listener.postPersist(ActiveRecording.java:172)
2024-10-24 10:45:40.613     at io.cryostat.recordings.ActiveRecording$Listener$ActiveRecordingEvent$Payload.of(ActiveRecording.java:228)
2024-10-24 10:45:40.613     at io.cryostat.recordings.ActiveRecording$Listener$ActiveRecordingEvent$Payload.<init>(ActiveRecording.java:222)
2024-10-24 10:45:40.613     at java.base/java.util.Objects.requireNonNull(Objects.java:209)
2024-10-24 10:45:40.613 2024-10-24 08:45:40,612 WARN  [com.arj.ats.arjuna] (executor-thread-1553) ARJUNA012125: TwoPhaseCoordinator.beforeCompletion - failed for SynchronizationImple< 0:ffff0ab81249:9561:6717d85f:36e1b, org.hibernate.resource.transaction.backend.jta.internal.synchronization.RegisteredSynchronization@6d6d8089 >: java.lang.NullPointerException

It's not really clear from this what caused the transaction rollback, just that there was some NPE. Some part of the stack trace seems to point to https://github.com/cryostatio/cryostat/blob/4f32d4f3b78b62acd5810499532cd1ffc165e37f/src/main/java/io/cryostat/recordings/ActiveRecording.java#L172 , but I'm not quite sure how something could be null at this point.

Is this "Request failed (500)" error consistently reproducible?

grzesuav commented 1 day ago

yes, for the same group where there is no JVM identfiers I got this. On my staging env I updated cryostat to 3.0.1 , will re-check with this version

andrewazores commented 1 day ago

I don't think 3.0.1 contained any patches that would affect these specific bugs, but I'd be curious to see if it does help somehow.

grzesuav commented 1 day ago

yeah, it still happening. It is misleading as recording gets started despite the error message

image
andrewazores commented 1 day ago

Cryostat creates the remote recording over JMX (or Agent HTTP, if you're using the Cryostat Agent instead of JMX) and then creates a database record for itself to keep track of that recording, so what you're seeing looks like Cryostat is able to connect to the target and perform the JMX operation, but then it fails when trying to create the database entry. So the remote recording does actually get created, but the Cryostat API operation and database transaction failed.

There was another somewhat related bug some time ago so now when you try to do various API operations on a target, Cryostat connects to the target and tries to list whatever remote recordings may exist on it and synchronize its database to that data at the later time.

So it should look like this:

  1. API request to Cryostat C to create a recording on Target T
  2. C opens JMX connection to T
  3. C asks T to start a recording
  4. C creates database entry E
  5. C responds to API request with OK
  6. Later, API request to C to list recordings (ex. when your web-client goes back to the active recording table)
  7. C connects to T to list recordings
  8. C updates database to reflect what it finds there, if anything has changed
  9. C responds to API request with list

In this case it looks like step 4 fails somehow and results in step 5 being an HTTP 500 response, but everything else is working.

The NPE actually makes a lot of sense now that I step through that, and with your observation that this happens specifically to the targets that don't have the JVM ID set:

https://github.com/cryostatio/cryostat/blob/4f32d4f3b78b62acd5810499532cd1ffc165e37f/src/main/java/io/cryostat/recordings/ActiveRecording.java#L230

https://github.com/cryostatio/cryostat/blob/4f32d4f3b78b62acd5810499532cd1ffc165e37f/src/main/java/io/cryostat/recordings/ActiveRecording.java#L223

so the database record creation fails because part of that transaction involves emitting a notification that the record has been created, and that notification references the target's JVM ID and requires it to be non-null. This post-persist notification hook therefore fails, and this rolls back the transaction.

So if that analysis is correct, then https://github.com/cryostatio/cryostat/pull/689 should fix the bug. This won't land in a release until 4.0 however which is months away. We are planning a 3.0.2 release in the meantime as well, but I will need to strip down that patch to something less invasive before I can backport it.


In the meantime, you might be able to work around this issue by creating a dummy Stored Credential. If you go to Security and create a credential with a match expression that matches these targets where the JVM ID is not present (or simply use the expression true), then this should trigger Cryostat to try reconnecting to those targets and recalculating their IDs.