DescartesResearch / TeaStore

A micro-service reference test application for model extraction, cloud management, energy efficiency, power prediction, single- and multi-tier auto-scaling
https://se.informatik.uni-wuerzburg.de
Apache License 2.0
118 stars 136 forks source link

Uncatched com.netflix.client.ClientException #233

Closed SoerenHenning closed 1 year ago

SoerenHenning commented 1 year ago

Occasionally, I'm getting the following error in the Recommender service:

$ kubectl logs -f teastore-recommender-769989c8c-tls62 teastore-recommender 
Setting heap space to 512000 KiB
18-Aug-2022 14:18:20.197 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version name:   Apache Tomcat/10.0.7
18-Aug-2022 14:18:20.204 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Jun 8 2021 16:31:06 UTC
18-Aug-2022 14:18:20.204 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version number: 10.0.7.0
18-Aug-2022 14:18:20.204 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
18-Aug-2022 14:18:20.205 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            4.19.0-18-amd64
18-Aug-2022 14:18:20.205 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
18-Aug-2022 14:18:20.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/local/openjdk-11
18-Aug-2022 14:18:20.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           11.0.11+9
18-Aug-2022 14:18:20.295 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
18-Aug-2022 14:18:20.296 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/tomcat
18-Aug-2022 14:18:20.296 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_HOME:         /usr/local/tomcat
18-Aug-2022 14:18:20.397 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties
18-Aug-2022 14:18:20.398 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
18-Aug-2022 14:18:20.398 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -javaagent:/kieker/agent/agent.jar
18-Aug-2022 14:18:20.398 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: --add-opens=java.base/java.lang=ALL-UNNAMED
18-Aug-2022 14:18:20.398 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dkieker.monitoring.configuration=/kieker/config/kieker.monitoring.properties
18-Aug-2022 14:18:20.399 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Daj.weaving.verbose=false
18-Aug-2022 14:18:20.399 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dorg.aspectj.weaver.loadtime.configuration=aop.xml
18-Aug-2022 14:18:20.399 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dkieker.monitoring.skipDefaultAOPConfiguration=true
18-Aug-2022 14:18:20.399 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Daj.weaving.loadersToSkip=java.net.URLClassLoader
18-Aug-2022 14:18:20.400 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcom.sun.jndi.ldap.object.disableEndpointIdentification=true
18-Aug-2022 14:18:20.400 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.ephemeralDHKeySize=2048
18-Aug-2022 14:18:20.400 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
18-Aug-2022 14:18:20.400 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dorg.apache.catalina.security.SecurityListener.UMASK=0027
18-Aug-2022 14:18:20.400 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dignore.endorsed.dirs=
18-Aug-2022 14:18:20.401 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.base=/usr/local/tomcat
18-Aug-2022 14:18:20.401 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.home=/usr/local/tomcat
18-Aug-2022 14:18:20.401 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.io.tmpdir=/usr/local/tomcat/temp
18-Aug-2022 14:18:20.410 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent Loaded Apache Tomcat Native library [1.2.30] using APR version [1.6.5].
18-Aug-2022 14:18:20.410 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true], UDS [true].
18-Aug-2022 14:18:20.496 INFO [main] org.apache.catalina.core.AprLifecycleListener.initializeSSL OpenSSL successfully initialized [OpenSSL 1.1.1d  10 Sep 2019]
18-Aug-2022 14:18:23.210 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8080"]
18-Aug-2022 14:18:23.409 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["ajp-nio-127.0.0.1-8009"]
18-Aug-2022 14:18:23.495 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [4980] milliseconds
18-Aug-2022 14:18:24.001 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
18-Aug-2022 14:18:24.002 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/10.0.7]
18-Aug-2022 14:18:24.115 INFO [main] org.apache.catalina.startup.HostConfig.deployWAR Deploying web application archive [/usr/local/tomcat/webapps/tools.descartes.teastore.recommender.war]
18-Aug-2022 14:18:34.505 INFO [main] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
[main] WARN com.netflix.config.sources.URLConfigurationSource - No URLs will be polled as dynamic configuration sources.
[main] INFO com.netflix.config.sources.URLConfigurationSource - To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
[main] INFO com.netflix.loadbalancer.BaseLoadBalancer - Client:  instantiated a LoadBalancer: {NFLoadBalancer:name=,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}
[main] INFO tools.descartes.teastore.registryclient.loadbalancers.ServiceLoadBalancer - Pre-initializing client-side load balancer for target: tools.descartes.teastore.persistence
[main] INFO tools.descartes.teastore.recommender.servlet.RecommenderStartup - Recommender loop time not set. Disabling periodic retraining.
18-Aug-2022 14:18:43.906 WARNING [main] org.apache.catalina.util.SessionIdGeneratorBase.createSecureRandom Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [105] milliseconds.
[main] INFO kieker.monitoring.core.configuration.ConfigurationFactory - Loading configuration from JVM-specified location: '/kieker/config/kieker.monitoring.properties'
[main] INFO kieker.monitoring.core.controller.TCPController - Could not parse port for the TCPController, deactivating this option. Received string was: 
WriterController: WriterClassName: kieker.monitoring.writer.filesystem.FileWriter
[main] INFO kieker.monitoring.core.controller.MonitoringController - Current State of kieker.monitoring (1.15-SNAPSHOT) Status: 'disabled'
        Name: 'KIEKER'; Hostname: 'teastore-recommender-769989c8c-tls62'; experimentID: '1'
JMXController: JMX disabled
TimeSource: 'kieker.monitoring.timer.SystemNanoTimer'
        Time in nanoseconds (with nanoseconds precision) since Thu Jan 01 00:00:00 UTC 1970'
ProbeController: disabled
WriterController:
        Queue type: class kieker.monitoring.queue.BlockingQueueDecorator
        Queue capacity: 10000
        Insert behavior (a.k.a. QueueFullBehavior): class kieker.monitoring.queue.behavior.BlockOnFailedInsertBehavior
                numBlocked: 0
Writer: 'kieker.monitoring.writer.filesystem.FileWriter'
        Configuration:
                kieker.monitoring.writer.filesystem.FileWriter.logFilePoolHandler='kieker.monitoring.writer.filesystem.RotatingLogFilePoolHandler'
                kieker.monitoring.writer.filesystem.FileWriter.PrioritizedQueueSize='100'
                kieker.monitoring.writer.filesystem.FileWriter.charsetName='UTF-8'
                kieker.monitoring.writer.filesystem.FileWriter.logStreamHandler='kieker.monitoring.writer.filesystem.TextLogStreamHandler'
                kieker.monitoring.writer.filesystem.FileWriter.bufferSize='8192'
                kieker.monitoring.writer.filesystem.FileWriter.maxEntriesInFile='25000'
                kieker.monitoring.writer.filesystem.FileWriter.maxLogFiles='-1'
                kieker.monitoring.writer.filesystem.FileWriter.MaxShutdownDelay='-1'
                kieker.monitoring.writer.filesystem.FileWriter.maxLogSize='-1'
                kieker.monitoring.writer.filesystem.FileWriter.mapFileHandler='kieker.monitoring.writer.filesystem.TextMapFileHandler'
                kieker.monitoring.writer.filesystem.FileWriter.flush='true'
                kieker.monitoring.writer.filesystem.FileWriter.QueueFullBehavior='0'
                kieker.monitoring.writer.filesystem.FileWriter.QueueSize='10000'
                kieker.monitoring.writer.filesystem.FileWriter.customStoragePath='/kieker/logs/'
                kieker.monitoring.writer.filesystem.FileWriter.actualStoragePath='/kieker/logs/kieker-20220818-141844-17827135508390104-UTC--KIEKER'

        Automatic assignment of logging timestamps: 'true'
Sampling Controller: Periodic Sensor available: Poolsize: '0'; Scheduled Tasks: '0'
[main] INFO kieker.monitoring.core.registry.ControlFlowRegistry - First threadId will be 8136737889263288320
18-Aug-2022 14:18:49.806 WARNING [main] org.glassfish.jersey.server.wadl.WadlFeature.configure JAX-B API not found . WADL feature is disabled.
18-Aug-2022 14:18:52.808 INFO [main] org.apache.catalina.startup.HostConfig.deployWAR Deployment of web application archive [/usr/local/tomcat/webapps/tools.descartes.teastore.recommender.war] has finished in [28,693] ms
18-Aug-2022 14:18:52.901 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
18-Aug-2022 14:18:53.103 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-127.0.0.1-8009"]
18-Aug-2022 14:18:53.199 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in [29703] milliseconds
com.netflix.client.ClientException: Number of retries on next server exceeded max 2 retries, while making a call for: teastore-persistence:8080
        at com.netflix.loadbalancer.reactive.LoadBalancerCommand$4.call(LoadBalancerCommand.java:352)
        at com.netflix.loadbalancer.reactive.LoadBalancerCommand$4.call(LoadBalancerCommand.java:345)
        at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)
        at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber$1$1.onError(OperatorRetryWithPredicate.java:107)
        at rx.observers.SerializedObserver.onError(SerializedObserver.java:152)
        at rx.observers.SerializedSubscriber.onError(SerializedSubscriber.java:78)
        at rx.internal.operators.OnSubscribeConcatMap$ConcatMapSubscriber.innerError(OnSubscribeConcatMap.java:192)
        at rx.internal.operators.OnSubscribeConcatMap$ConcatMapInnerSubscriber.onError(OnSubscribeConcatMap.java:340)
        at rx.Observable.unsafeSubscribe(Observable.java:10263)
        at rx.internal.operators.OnSubscribeConcatMap$ConcatMapSubscriber.drain(OnSubscribeConcatMap.java:286)
        at rx.internal.operators.OnSubscribeConcatMap$ConcatMapSubscriber.onNext(OnSubscribeConcatMap.java:144)
        at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:185)
        at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:180)
        at rx.Observable.unsafeSubscribe(Observable.java:10256)
        at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:94)
        at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:42)
        at rx.Observable.unsafeSubscribe(Observable.java:10256)
        at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber$1.call(OperatorRetryWithPredicate.java:127)
        at rx.internal.schedulers.TrampolineScheduler$InnerCurrentThreadScheduler.enqueue(TrampolineScheduler.java:73)
        at rx.internal.schedulers.TrampolineScheduler$InnerCurrentThreadScheduler.schedule(TrampolineScheduler.java:52)
        at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber.onNext(OperatorRetryWithPredicate.java:79)
        at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber.onNext(OperatorRetryWithPredicate.java:45)
        at rx.internal.util.ScalarSynchronousObservable$WeakSingleProducer.request(ScalarSynchronousObservable.java:276)
        at rx.Subscriber.setProducer(Subscriber.java:209)
        at rx.internal.util.ScalarSynchronousObservable$JustOnSubscribe.call(ScalarSynchronousObservable.java:138)
        at rx.internal.util.ScalarSynchronousObservable$JustOnSubscribe.call(ScalarSynchronousObservable.java:129)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
        at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
        at rx.Observable.subscribe(Observable.java:10352)
        at rx.Observable.subscribe(Observable.java:10319)
        at rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:443)
        at rx.observables.BlockingObservable.first(BlockingObservable.java:166)
        at tools.descartes.teastore.registryclient.loadbalancers.ServiceLoadBalancer.loadBalanceRESTOperation(ServiceLoadBalancer.java:230)
        at tools.descartes.teastore.registryclient.loadbalancers.ServiceLoadBalancer.loadBalanceRESTOperation(ServiceLoadBalancer.java:201)
        at tools.descartes.teastore.registryclient.rest.LoadBalancedCRUDOperations.getEntities_aroundBody10(LoadBalancedCRUDOperations.java:202)
        at tools.descartes.teastore.registryclient.rest.LoadBalancedCRUDOperations$AjcClosure11.run(LoadBalancedCRUDOperations.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:167)
        at tools.descartes.teastore.kieker.probes.AbstractOperationExecutionWithParameterAspect.operation(AbstractOperationExecutionWithParameterAspect.java:57)
        at tools.descartes.teastore.registryclient.rest.LoadBalancedCRUDOperations.getEntities(LoadBalancedCRUDOperations.java:202)
        at tools.descartes.teastore.recommender.servlet.TrainingSynchronizer.retrieveDataAndRetrain_aroundBody8(TrainingSynchronizer.java:184)
        at tools.descartes.teastore.recommender.servlet.TrainingSynchronizer$AjcClosure9.run(TrainingSynchronizer.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:167)
        at tools.descartes.teastore.kieker.probes.AbstractOperationExecutionWithParameterAspect.operation(AbstractOperationExecutionWithParameterAspect.java:57)
        at tools.descartes.teastore.recommender.servlet.TrainingSynchronizer.retrieveDataAndRetrain(TrainingSynchronizer.java:175)
        at tools.descartes.teastore.recommender.servlet.RecommenderStartup.lambda$contextInitialized$0(RecommenderStartup.java:74)
        at tools.descartes.teastore.registryclient.StartupCallbackTask.run(StartupCallbackTask.java:59)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: jakarta.ws.rs.ProcessingException: Timeout exceeded
        at org.glassfish.jersey.grizzly.connector.GrizzlyConnector.apply(GrizzlyConnector.java:236)
        at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:297)
        at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:662)
        at org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:697)
        at org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:691)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:205)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390)
        at org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:691)
        at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:661)
        at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:413)
        at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:313)
        at tools.descartes.teastore.registryclient.rest.NonBalancedCRUDOperations.getEntities(NonBalancedCRUDOperations.java:230)
        at tools.descartes.teastore.registryclient.rest.LoadBalancedCRUDOperations.lambda$getEntities$5(LoadBalancedCRUDOperations.java:203)
        at tools.descartes.teastore.registryclient.loadbalancers.ServiceLoadBalancerResult.fromRESTOperation(ServiceLoadBalancerResult.java:52)
        at tools.descartes.teastore.registryclient.loadbalancers.ServiceLoadBalancer.lambda$loadBalanceRESTOperation$1(ServiceLoadBalancer.java:223)
        at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3$1.call(LoadBalancerCommand.java:303)
        at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3$1.call(LoadBalancerCommand.java:287)
        at rx.internal.util.ScalarSynchronousObservable$3.call(ScalarSynchronousObservable.java:231)
        at rx.internal.util.ScalarSynchronousObservable$3.call(ScalarSynchronousObservable.java:228)
        at rx.Observable.unsafeSubscribe(Observable.java:10256)
        ... 51 more
Caused by: java.util.concurrent.TimeoutException: Timeout exceeded
        at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider.timeout(GrizzlyAsyncHttpProvider.java:434)
        at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$3.onTimeout(GrizzlyAsyncHttpProvider.java:282)
        at org.glassfish.grizzly.utils.IdleTimeoutFilter$DefaultWorker.doWork(IdleTimeoutFilter.java:323)
        at org.glassfish.grizzly.utils.IdleTimeoutFilter$DefaultWorker.doWork(IdleTimeoutFilter.java:305)
        at org.glassfish.grizzly.utils.DelayedExecutor$DelayedRunnable.run(DelayedExecutor.java:132)
        ... 3 more
java.lang.NullPointerException: ServiceLoadBalancerResult was null!
        at tools.descartes.teastore.registryclient.loadbalancers.ServiceLoadBalancer.loadBalanceRESTOperation(ServiceLoadBalancer.java:232)
        at tools.descartes.teastore.registryclient.loadbalancers.ServiceLoadBalancer.loadBalanceRESTOperation(ServiceLoadBalancer.java:201)
        at tools.descartes.teastore.registryclient.rest.LoadBalancedCRUDOperations.getEntities_aroundBody10(LoadBalancedCRUDOperations.java:202)
        at tools.descartes.teastore.registryclient.rest.LoadBalancedCRUDOperations$AjcClosure11.run(LoadBalancedCRUDOperations.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:167)
        at tools.descartes.teastore.kieker.probes.AbstractOperationExecutionWithParameterAspect.operation(AbstractOperationExecutionWithParameterAspect.java:57)
        at tools.descartes.teastore.registryclient.rest.LoadBalancedCRUDOperations.getEntities(LoadBalancedCRUDOperations.java:202)
        at tools.descartes.teastore.recommender.servlet.TrainingSynchronizer.retrieveDataAndRetrain_aroundBody8(TrainingSynchronizer.java:184)
        at tools.descartes.teastore.recommender.servlet.TrainingSynchronizer$AjcClosure9.run(TrainingSynchronizer.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:167)
        at tools.descartes.teastore.kieker.probes.AbstractOperationExecutionWithParameterAspect.operation(AbstractOperationExecutionWithParameterAspect.java:57)
        at tools.descartes.teastore.recommender.servlet.TrainingSynchronizer.retrieveDataAndRetrain(TrainingSynchronizer.java:175)
        at tools.descartes.teastore.recommender.servlet.RecommenderStartup.lambda$contextInitialized$0(RecommenderStartup.java:74)
        at tools.descartes.teastore.registryclient.StartupCallbackTask.run(StartupCallbackTask.java:59)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

I cannot really say, in which scenarios the exception is thrown, but if the problem occurs, the /train/isready endpoint always return false. It can then usually be solved by 1-2 restarts. Maybe it is sufficient to add the ClientException to the catch block:

https://github.com/DescartesResearch/TeaStore/blob/c1554c7fbfa9eb55c2bfc952d839f1cabe6e3c66/services/tools.descartes.teastore.recommender/src/main/java/tools/descartes/teastore/recommender/servlet/TrainingSynchronizer.java#L183-L192


My setup is basically the same as in #230 and my original problem, described there, might have also been partially related to this issue.

SimonEismann commented 1 year ago

Looking at this, it seems like this error would occur if the database is signaling ready, but then the recommender does not receive any data during the training phase.

I've added a retry mechanism here and added the NullpointerException to the exception block (it seems to think that there will never be a ClientException in that block). Could you checkout the version from development and see if this fixed the issue?

SoerenHenning commented 1 year ago

Thanks! Is there a Docker image to test your fix?

SimonEismann commented 1 year ago

It seems the github actions for the commit are stuck in queueing, I've retriggered and seems like it is working again so the new images should be available via the :development tag.

Let me know if this already resolves your issue. If it's about temporary connection issues this should resolve it. If the recommends can permanently not reach the persistence despite it being labeled as online, then we will need to take an in depth look at what is going on.

SoerenHenning commented 1 year ago

After running a bunch of experiments, I did not see that issue again so I think it's solved.

SimonEismann commented 1 year ago

great to hear, feel free to reopen the issue in case it does pop back up :)