bloxbean / yaci

A Cardano Mini Protocols implementation in Java
MIT License
22 stars 3 forks source link

LocalStateQueryClient stuck in Acquiring state #19

Closed iFergal closed 11 months ago

iFergal commented 1 year ago

[yaci 0.1.12, cardano-node 1.35.7 (was an issue syncing preprod with 8.x.x a while back, still haven't moved back up if it was fixed)]

Sometimes the LocalStateQueryClient gets into an unrecoverable state in my application.

There is only 1 thread accessing this area of the code (as the calling block is synchronized), and after seeing this in the past (being stuck in Ideal state or Acquiring state etc) - this has been added before every call to the client:

private void releaseAndAcquireSnapshot(){
    try {
        localStateQueryClient.release().block(Duration.ofSeconds(5));
    } catch (Exception e) {
        log.error("Fail to release snapshot");
    }
    try {
        localStateQueryClient.acquire().block(Duration.ofSeconds(5));
    } catch (Exception e) {
        log.error("Fail to acquire snapshot");
        throw e;
    }
}

But still I seem to have hit this error:

2023-08-24T15:45:20.521Z ERROR 1 --- [bmitterThread-1] o.c.m.service.impl.LocalNodeServiceImpl  : Fail to release snapshot
2023-08-24T15:45:20.521Z ERROR 1 --- [bmitterThread-1] o.c.m.service.impl.LocalNodeServiceImpl  : Fail to acquire snapshot
java.lang.IllegalStateException: Current state [Querying] doesn't support this message : MsgAcquire(point=null)
        at com.bloxbean.cardano.yaci.core.protocol.State.verifyMessageType(State.java:35)
        at com.bloxbean.cardano.yaci.core.protocol.localstate.LocalStateQueryAgent.acquire(LocalStateQueryAgent.java:158)
        at com.bloxbean.cardano.yaci.helper.LocalStateQueryClient.lambda$acquire$1(LocalStateQueryClient.java:130)
        at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:58)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4485)
        at reactor.core.publisher.Mono.block(Mono.java:1733)
        at org.cardanofoundation.mb.service.impl.LocalNodeServiceImpl.releaseAndAcquireSnapshot(LocalNodeServiceImpl.java:50)
        at org.cardanofoundation.mb.service.impl.LocalNodeServiceImpl.queryUTXOs(LocalNodeServiceImpl.java:61)
        at org.cardanofoundation.mb.service.impl.UtxoServiceImpl.getUnusedUtxosSortByAmount(UtxoServiceImpl.java:34)
        at org.cardanofoundation.mb.service.impl.BatchConsumptionServiceImpl.getAppropriateTxIn(BatchConsumptionServiceImpl.java:412)
        at org.cardanofoundation.mb.service.impl.BatchConsumptionServiceImpl.getTheClientWalletInfo(BatchConsumptionServiceImpl.java:703)
        at org.cardanofoundation.mb.service.impl.BatchConsumptionServiceImpl.submitBatchToNode(BatchConsumptionServiceImpl.java:494)
        at org.cardanofoundation.mb.service.impl.BatchConsumptionServiceImpl.consumeBasedOnTime(BatchConsumptionServiceImpl.java:192)
        at jdk.internal.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:391)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:702)
        at org.cardanofoundation.mb.service.impl.BatchConsumptionServiceImpl$$SpringCGLIB$$0.consumeBasedOnTime(<generated>)
        at org.cardanofoundation.mb.task.TimeBasedBatchConsumptionTask.run(TimeBasedBatchConsumptionTask.java:41)
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
        Suppressed: java.lang.Exception: #block terminated with an error
                at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:139)
                at reactor.core.publisher.Mono.block(Mono.java:1734)
                ... 29 more

It seems to be stuck in an Acquiring state here (node was up) - this is a job scheduling and batching service and if I push more jobs, or it's time-based batching service kicks in like here (consumeBasedOnTime) - the same error happens until I restart the entire service.

It seems like it's stuck in a bad state and the connection needs to be restarted.