cryostatio / cryostat

Other
6 stars 8 forks source link

[Bug] HTTP 504 Gateway Timeout and "action completed with multiple threads" when starting a recording #538

Open andrewazores opened 4 days ago

andrewazores commented 4 days ago

Current Behavior

When attempting to create a new recording on a target application, Cryostat responded with an HTTP 504 Gateway Timeout response and the recording was not created. The Cryostat pod logs show an odd stack trace.

Expected Behavior

The recording should be successfully created on the target.

Steps To Reproduce

No response

Environment

No response

Anything else?

2024-06-27 17:04:25,812 INFO  [io.qua.htt.access-log] (executor-thread-1839) 10.22.8.104 - - [27/Jun/2024:17:04:25 +0000] "POST /api/v1/targets/service%3Ajmx%3Armi%3A%2F%2F%2Fjndi%2Frmi%3A%2F%2F10-130-0-175.namespace.pod%3A9091%2Fjmxrmi/recordings HTTP/1.1" 308 -
2024-06-27 17:04:26,029 WARN  [io.cry.tar.TargetConnectionManager] (executor-thread-1839) java.io.IOException: The client has been closed.
2024-06-27 17:04:33,639 INFO  [io.qua.htt.access-log] (executor-thread-1848) 10.129.0.1 - - [27/Jun/2024:17:04:33 +0000] "GET /health/liveness HTTP/1.1" 204 -
2024-06-27 17:04:36,039 WARN  [com.arj.ats.arjuna] (executor-thread-1839) ARJUNA012095: Abort of action id 0:ffff0a8100d7:9219:667b2756:2e3d8 invoked while multiple threads active within it.
2024-06-27 17:04:36,047 WARN  [com.arj.ats.arjuna] (executor-thread-1839) ARJUNA012381: Action id 0:ffff0a8100d7:9219:667b2756:2e3d8 completed with multiple threads - thread executor-thread-1846 was in progress with java.base@17.0.11/jdk.internal.misc.Unsafe.park(Native Method)
java.base@17.0.11/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
java.base@17.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:717)
java.base@17.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1074)
java.base@17.0.11/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:276)
io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:63)
io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
io.cryostat.targets.TargetConnectionManager.executeConnectedTask(TargetConnectionManager.java:207)
io.cryostat.targets.TargetConnectionManager_ClientProxy.executeConnectedTask(Unknown Source)
io.cryostat.recordings.RecordingOptionsBuilderFactory.create(RecordingOptionsBuilderFactory.java:36)
io.cryostat.recordings.RecordingOptionsBuilderFactory_ClientProxy.create(Unknown Source)
io.cryostat.recordings.RecordingHelper.lambda$startRecordingImpl$10(RecordingHelper.java:360)
io.cryostat.recordings.RecordingHelper$$Lambda$2417/0x00007f3fbfdae000.execute(Unknown Source)
io.cryostat.targets.TargetConnectionManager.lambda$executeConnectedTaskUni$1(TargetConnectionManager.java:201)
io.cryostat.targets.TargetConnectionManager$$Lambda$2333/0x00007f3fbfd43580.apply(Unknown Source)
io.smallrye.mutiny.unchecked.UncheckedFunction.lambda$toFunction$0(UncheckedFunction.java:45)
io.smallrye.mutiny.unchecked.UncheckedFunction$$Lambda$2334/0x00007f3fbfd437d8.apply(Unknown Source)
io.smallrye.context.impl.wrappers.SlowContextualFunction.apply(SlowContextualFunction.java:21)
io.smallrye.mutiny.operators.uni.UniOnItemTransform$UniOnItemTransformProcessor.onItem(UniOnItemTransform.java:36)
io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription.forwardResult(UniCreateFromCompletionStage.java:63)
io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription$$Lambda$2335/0x00007f3fbfd4c000.accept(Unknown Source)
java.base@17.0.11/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
java.base@17.0.11/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:887)
java.base@17.0.11/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2325)
java.base@17.0.11/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:144)
io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription.forward(UniCreateFromCompletionStage.java:51)
io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage.subscribe(UniCreateFromCompletionStage.java:35)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniOnItemTransform.subscribe(UniOnItemTransform.java:22)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniOnFailureTransform.subscribe(UniOnFailureTransform.java:31)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniOnItemConsume.subscribe(UniOnItemConsume.java:30)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniOnFailureTransform.subscribe(UniOnFailureTransform.java:31)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniOnFailureTransform.subscribe(UniOnFailureTransform.java:31)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniOnFailureTransform.subscribe(UniOnFailureTransform.java:31)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniOnFailureTransform.subscribe(UniOnFailureTransform.java:31)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.converters.uni.UniToMultiPublisher$UniToMultiSubscription.request(UniToMultiPublisher.java:73)
io.smallrye.mutiny.subscription.SwitchableSubscriptionSubscriber.setOrSwitchUpstream(SwitchableSubscriptionSubscriber.java:205)
io.smallrye.mutiny.subscription.SwitchableSubscriptionSubscriber.onSubscribe(SwitchableSubscriptionSubscriber.java:107)
io.smallrye.mutiny.converters.uni.UniToMultiPublisher.subscribe(UniToMultiPublisher.java:25)
io.smallrye.mutiny.groups.MultiCreate$1.subscribe(MultiCreate.java:165)
io.smallrye.mutiny.operators.multi.MultiRetryWhenOp$RetryWhenOperator.resubscribe(MultiRetryWhenOp.java:157)
io.smallrye.mutiny.operators.multi.MultiRetryWhenOp$TriggerSubscriber.onNext(MultiRetryWhenOp.java:188)
io.smallrye.mutiny.helpers.HalfSerializer.onNext(HalfSerializer.java:30)
io.smallrye.mutiny.helpers.StrictMultiSubscriber.onItem(StrictMultiSubscriber.java:84)
io.smallrye.mutiny.operators.multi.MultiConcatMapOp$ConcatMapMainSubscriber.tryEmit(MultiConcatMapOp.java:182)
io.smallrye.mutiny.operators.multi.MultiConcatMapOp$ConcatMapInner.onItem(MultiConcatMapOp.java:285)
io.smallrye.mutiny.subscription.MultiSubscriber.onNext(MultiSubscriber.java:61)
io.smallrye.mutiny.converters.uni.UniToMultiPublisher$UniToMultiSubscription.onItem(UniToMultiPublisher.java:92)
io.smallrye.mutiny.operators.uni.UniDelayOnItem$UniDelayOnItemProcessor.lambda$onItem$0(UniDelayOnItem.java:53)
io.smallrye.mutiny.operators.uni.UniDelayOnItem$UniDelayOnItemProcessor$$Lambda$2969/0x00007f3fbff3b210.run(Unknown Source)
org.jboss.threads.EnhancedQueueExecutor$RunnableScheduledFuture.performTask(EnhancedQueueExecutor.java:2892)
org.jboss.threads.EnhancedQueueExecutor$RunnableScheduledFuture.performTask(EnhancedQueueExecutor.java:2883)
org.jboss.threads.EnhancedQueueExecutor$AbstractScheduledFuture.run(EnhancedQueueExecutor.java:2741)
io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base@17.0.11/java.lang.Thread.run(Thread.java:840)

2024-06-27 17:04:36,047 WARN  [com.arj.ats.arjuna] (executor-thread-1839) ARJUNA012381: Action id 0:ffff0a8100d7:9219:667b2756:2e3d8 completed with multiple threads - thread executor-thread-1839 was in progress with java.base/java.lang.Thread.getStackTrace(Thread.java:1619)
com.arjuna.ats.arjuna.coordinator.BasicAction.createStackTraces(BasicAction.java:3406)
com.arjuna.ats.arjuna.coordinator.BasicAction.checkChildren(BasicAction.java:3443)
com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1672)
com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:124)
com.arjuna.ats.arjuna.AtomicAction.abort(AtomicAction.java:186)
com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.rollbackAndDisassociate(TransactionImple.java:1377)
com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.rollback(BaseTransaction.java:145)
io.quarkus.narayana.jta.runtime.NotifyingTransactionManager.rollback(NotifyingTransactionManager.java:94)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.endTransaction(TransactionalInterceptorBase.java:404)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:175)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:107)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.doIntercept(TransactionalInterceptorRequired.java:38)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.intercept(TransactionalInterceptorBase.java:61)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.intercept(TransactionalInterceptorRequired.java:32)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired_Bean.intercept(Unknown Source)
io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
io.cryostat.recordings.Recordings_Subclass.createRecording(Unknown Source)
io.cryostat.recordings.Recordings$quarkusrestinvoker$createRecording_3a6d003fcdb6a24416e86a17a2a45440af6f87f7.invoke(Unknown Source)
org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base/java.lang.Thread.run(Thread.java:840)

2024-06-27 17:04:36,047 WARN  [com.arj.ats.arjuna] (executor-thread-1839) ARJUNA012381: Action id 0:ffff0a8100d7:9219:667b2756:2e3d8 completed with multiple threads - thread executor-thread-1849 was in progress with io.cryostat.targets.TargetConnectionManager.lambda$executeConnectedTaskUni$1(TargetConnectionManager.java:199)
io.cryostat.targets.TargetConnectionManager$$Lambda$2333/0x00007f3fbfd43580.apply(Unknown Source)
io.smallrye.mutiny.unchecked.UncheckedFunction.lambda$toFunction$0(UncheckedFunction.java:45)
io.smallrye.mutiny.unchecked.UncheckedFunction$$Lambda$2334/0x00007f3fbfd437d8.apply(Unknown Source)
io.smallrye.context.impl.wrappers.SlowContextualFunction.apply(SlowContextualFunction.java:21)
io.smallrye.mutiny.operators.uni.UniOnItemTransform$UniOnItemTransformProcessor.onItem(UniOnItemTransform.java:36)
io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription.forwardResult(UniCreateFromCompletionStage.java:63)
io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription$$Lambda$2335/0x00007f3fbfd4c000.accept(Unknown Source)
java.base@17.0.11/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
java.base@17.0.11/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
java.base@17.0.11/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
java.base@17.0.11/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1773)
io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base@17.0.11/java.lang.Thread.run(Thread.java:840)

2024-06-27 17:04:36,047 WARN  [com.arj.ats.arjuna] (executor-thread-1839) ARJUNA012108: CheckedAction::check - atomic action 0:ffff0a8100d7:9219:667b2756:2e3d8 aborting with 3 threads active!
2024-06-27 17:04:36,055 WARN  [com.arj.ats.jta] (executor-thread-1839) ARJUNA016045: attempted rollback of < formatId=131077, gtrid_length=35, bqual_length=36, tx_uid=0:ffff0a8100d7:9219:667b2756:2e3d8, node_name=quarkus, branch_uid=0:ffff0a8100d7:9219:667b2756:2e3db, subordinatenodename=null, eis_name=0 > (io.agroal.narayana.LocalXAResource@1d0011fd) failed with exception code XAException.XAER_RMERR: javax.transaction.xa.XAException: Error trying to transactionRollback local transaction: Enlisted connection used without active transaction
    at io.agroal.narayana.LocalXAResource.xaException(LocalXAResource.java:140)
    at io.agroal.narayana.LocalXAResource.xaException(LocalXAResource.java:134)
    at io.agroal.narayana.LocalXAResource.rollback(LocalXAResource.java:88)
    at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelAbort(XAResourceRecord.java:362)
    at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:3037)
    at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:3016)
    at com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1679)
    at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:124)
    at com.arjuna.ats.arjuna.AtomicAction.abort(AtomicAction.java:186)
    at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.rollbackAndDisassociate(TransactionImple.java:1377)
    at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.rollback(BaseTransaction.java:145)
    at io.quarkus.narayana.jta.runtime.NotifyingTransactionManager.rollback(NotifyingTransactionManager.java:94)
    at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.endTransaction(TransactionalInterceptorBase.java:404)
    at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:175)
    at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:107)
    at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.doIntercept(TransactionalInterceptorRequired.java:38)
    at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.intercept(TransactionalInterceptorBase.java:61)
    at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.intercept(TransactionalInterceptorRequired.java:32)
    at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired_Bean.intercept(Unknown Source)
    at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
    at io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
    at io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
    at io.cryostat.recordings.Recordings_Subclass.createRecording(Unknown Source)
    at io.cryostat.recordings.Recordings$quarkusrestinvoker$createRecording_3a6d003fcdb6a24416e86a17a2a45440af6f87f7.invoke(Unknown Source)
    at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
    at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:141)
    at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.sql.SQLException: Enlisted connection used without active transaction
    at io.agroal.pool.ConnectionHandler.verifyEnlistment(ConnectionHandler.java:398)
    at io.agroal.pool.ConnectionHandler.transactionRollback(ConnectionHandler.java:366)
    at io.agroal.narayana.LocalXAResource.rollback(LocalXAResource.java:85)
    ... 31 more

2024-06-27 17:04:36,056 WARN  [io.cry.ExceptionMappers] (executor-thread-1839) io.smallrye.mutiny.TimeoutException
2024-06-27 17:04:36,057 INFO  [io.qua.htt.access-log] (executor-thread-1839) 10.22.8.104 - - [27/Jun/2024:17:04:36 +0000] "POST /api/v3/targets/1/recordings HTTP/1.1" 504 -
andrewazores commented 4 days ago

Looks similar: https://github.com/quarkusio/quarkus/issues/38913

According to this, the "multiple threads" message may just be a side effect of how the actual operation timeout is reported.

andrewazores commented 4 days ago

After this error, trying to create another recording with the same name fails:

2024-06-27 17:20:43,812 WARN [io.cry.ExceptionMappers] (executor-thread-1856) io.cryostat.util.EntityExistsException: Recording with name continue already exists. Try again with a different name
2024-06-27 17:20:43,812 INFO [io.qua.htt.access-log] (executor-thread-1856) 10.22.8.104 - - [27/Jun/2024:17:20:43 +0000] "POST /api/v3/targets/1/recordings HTTP/1.1" 409 76

and upon further investigation, the original request actually did result in a new recording being created on the target. Re-visiting the active recordings table results in Cryostat re-querying the target and finding that the recording does exist. However, since this is re-discovered after the fact, Cryostat thinks this recording was created externally and it has no labels associated with it.