GoogleCloudPlatform / spring-cloud-gcp

New home for Spring Cloud GCP development starting with version 2.0.
Apache License 2.0
429 stars 319 forks source link

Spring GCP subscribers will softlock against large queues with "Exactly Once" delivery enabled. #2491

Closed chases2 closed 10 months ago

chases2 commented 11 months ago

Describe the bug If you boot a GCP PubSub consumer under these circumstances...

... then the consumer will fail to consume with no errors, potentially for hours.

Exact versions are in the pom.xml of the example below.

Sample A sample project with installation instructions can be found here: https://github.com/chases2/ahab-sturdy-pancake

meltsufin commented 11 months ago

Do you feel this is caused by something in Spring Cloud GCP, or in the underlying Pub/Sub client library, or the Pub/Sub service itself?

chases2 commented 11 months ago

I suspect the Spring GCP libraries, but I don't know for certain yet. I'm planning on adding a Java subscriber that doesn't use any of these libraries to the example project.

chases2 commented 10 months ago

I've also been able to find the following error in the example code when it does fail. It will usually run for long hours at a time and not process messages, but will occasionally throw the following exception and restart:

textPayload: "2024-01-07T13:50:12.639Z ERROR 1 --- [bscriber-SE-1-1] c.g.c.util.concurrent.AbstractFuture     : RuntimeException while executing runnable CallbackListener{com.google.api.core.ApiFutures$1@2c27f549} with executor MoreExecutors.directExecutor()"

"java.util.ConcurrentModificationException: null
        at java.base/java.util.LinkedHashMap$LinkedHashIterator.nextNode(Unknown Source) ~[na:na]
        at java.base/java.util.LinkedHashMap$LinkedEntryIterator.next(Unknown Source) ~[na:na]
        at java.base/java.util.LinkedHashMap$LinkedEntryIterator.next(Unknown Source) ~[na:na]
        at com.google.cloud.pubsub.v1.MessageDispatcher.notifyAckSuccess(MessageDispatcher.java:423) ~[google-cloud-pubsub-1.125.8.jar:1.125.8]
        at com.google.cloud.pubsub.v1.StreamingSubscriberConnection$2.onSuccess(StreamingSubscriberConnection.java:530) ~[google-cloud-pubsub-1.125.8.jar:1.125.8]
        at com.google.cloud.pubsub.v1.StreamingSubscriberConnection$2.onSuccess(StreamingSubscriberConnection.java:523) ~[google-cloud-pubsub-1.125.8.jar:1.125.8]
        at com.google.api.core.ApiFutures$1.onSuccess(ApiFutures.java:89) ~[api-common-2.19.0.jar:2.19.0]
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1138) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:783) ~[guava-31.1-jre.jar:na]
        at com.google.api.gax.retrying.BasicRetryingFuture.handleAttempt(BasicRetryingFuture.java:203) ~[gax-2.36.0.jar:2.36.0]
        at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.handle(CallbackChainRetryingFuture.java:135) ~[gax-2.36.0.jar:2.36.0]
        at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.run(CallbackChainRetryingFuture.java:115) ~[gax-2.36.0.jar:2.36.0]
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:783) ~[guava-31.1-jre.jar:na]
        at com.google.api.core.AbstractApiFuture$InternalSettableFuture.set(AbstractApiFuture.java:87) ~[api-common-2.19.0.jar:2.19.0]
        at com.google.api.core.AbstractApiFuture.set(AbstractApiFuture.java:70) ~[api-common-2.19.0.jar:2.19.0]
        at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onSuccess(GrpcExceptionCallable.java:88) ~[gax-grpc-2.36.0.jar:2.36.0]
        at com.google.api.core.ApiFutures$1.onSuccess(ApiFutures.java:89) ~[api-common-2.19.0.jar:2.19.0]
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1138) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038) ~[guava-31.1-jre.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:783) ~[guava-31.1-jre.jar:na]
        at io.grpc.stub.ClientCalls$GrpcFuture.set(ClientCalls.java:563) ~[grpc-stub-1.58.0.jar:1.58.0]
        at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:536) ~[grpc-stub-1.58.0.jar:1.58.0]
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-api-1.58.0.jar:1.58.0]
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-api-1.58.0.jar:1.58.0]
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-api-1.58.0.jar:1.58.0]
        at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:570) ~[gax-grpc-2.36.0.jar:2.36.0]
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:567) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:71) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:735) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:716) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.58.0.jar:1.58.0]
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.58.0.jar:1.58.0]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[na:na]
        at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]"
timestamp: "2024-01-07T13:50:12.651112623Z"

These all seem to be in the GCP library or its dependencies (grpc, gax), but it's not clear to me if this is a bug in those dependencies or in the way that Spring GCP calls these dependencies. What do you think?

burkedavison commented 10 months ago

This looks like an issue that has been resolved in google-cloud-pubsub 1.125.12, which is available in libraries-bom 26.28.0, which is available in Spring Cloud GCP 3.7.5, 4.9.0, and 5.0.0.

Could you verify which version you are using, and update if prior to one of the listed releases?

chases2 commented 10 months ago

As indicated in the original bug, the bug is reproduced with these versions:

<java.version>17</java.version>
<spring-cloud-gcp.version>4.8.3</spring-cloud-gcp.version>
<spring-cloud.version>2022.0.4</spring-cloud.version>

Updating to these versions with the exact same code make the issue not reproducible anymore:

<java.version>21</java.version>
<spring-cloud-gcp.version>5.0.0</spring-cloud-gcp.version>
<spring-cloud.version>2023.0.0</spring-cloud.version>

I'll be updating the demonstration code with the known fix(es).

chases2 commented 10 months ago

Demonstration code and fixes: https://github.com/chases2/ahab-sturdy-pancake