apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.29k stars 3.59k forks source link

Sink instances die (and do not restart automatically) due to "Bookie handle is not available" error #9428

Closed fmiguelez closed 1 year ago

fmiguelez commented 3 years ago

Describe the bug We have a cluster with 3 brokers and 3 bookies. We are facing issues with our sink that writes to an external DB. This sink reads from 36 topics using 3 instances. Each topic data goes to a DB table.

The thing is that now and then our instances die with error below (that does not reach our sink code). Eventually all three instances die.

20:00:37.699 [pulsar-client-io-1-2] WARN org.apache.pulsar.client.impl.BinaryProtoLookupService - [persistent://dbus/tdf/measure-route-15min] failed to get schema : org.apache.pulsar.client.api.PulsarClientException: java.io.IOException: Bookie handle is not available - ledger=47 - operation=Failed to read entry - entry=0 java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException: java.io.IOException: Bookie handle is not available - ledger=47 - operation=Failed to read entry - entry=0 at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:925) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:967) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) [?:1.8.0_275] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) [?:1.8.0_275] at org.apache.pulsar.client.impl.ClientCnx.handleGetSchemaResponse(ClientCnx.java:736) [org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:373) [org.apache.pulsar-pulsar-common-2.7.0.jar:2.7.0] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [io.netty-netty-codec-4.1.51.Final.jar:4.1.51.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [io.netty-netty-codec-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [io.netty-netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar:4.1.51.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [io.netty-netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar:4.1.51.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [io.netty-netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar:4.1.51.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [io.netty-netty-common-4.1.51.Final.jar:4.1.51.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.51.Final.jar:4.1.51.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.51.Final.jar:4.1.51.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] Caused by: org.apache.pulsar.client.api.PulsarClientException: java.io.IOException: Bookie handle is not available - ledger=47 - operation=Failed to read entry - entry=0 at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1031) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.ClientCnx.lambda$sendGetSchema$13(ClientCnx.java:803) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966) ~[?:1.8.0_275] ... 24 more 20:00:37.707 [dbus/tdf/histsink-vertica1-1] ERROR org.apache.pulsar.functions.instance.JavaInstanceRunnable - [dbus/tdf/histsink-vertica1:1] Uncaught exception in Java Instance com.google.common.util.concurrent.UncheckedExecutionException: org.apache.commons.lang3.SerializationException: Failed at fetching schema info for 0 at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051) ~[com.google.guava-guava-30.0-jre.jar:?] at com.google.common.cache.LocalCache.get(LocalCache.java:3951) ~[com.google.guava-guava-30.0-jre.jar:?] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) ~[com.google.guava-guava-30.0-jre.jar:?] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4935) ~[com.google.guava-guava-30.0-jre.jar:?] at org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.read(AbstractMultiVersionReader.java:86) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.schema.AbstractStructSchema.decode(AbstractStructSchema.java:60) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.schema.AutoConsumeSchema.decode(AutoConsumeSchema.java:97) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.schema.AutoConsumeSchema.decode(AutoConsumeSchema.java:40) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.MessageImpl.getValue(MessageImpl.java:301) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.functions.source.PulsarRecord.getValue(PulsarRecord.java:74) ~[org.apache.pulsar-pulsar-functions-instance-2.7.0.jar:2.7.0] at org.apache.pulsar.functions.instance.JavaInstanceRunnable.readInput(JavaInstanceRunnable.java:400) ~[org.apache.pulsar-pulsar-functions-instance-2.7.0.jar:?] at org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:237) [org.apache.pulsar-pulsar-functions-instance-2.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] Caused by: org.apache.commons.lang3.SerializationException: Failed at fetching schema info for 0 at org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader.getSchemaInfoByVersion(AbstractMultiVersionReader.java:125) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.schema.generic.MultiVersionGenericAvroReader.loadReader(MultiVersionGenericAvroReader.java:43) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(AbstractMultiVersionReader.java:52) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.schema.reader.AbstractMultiVersionReader$1.load(AbstractMultiVersionReader.java:49) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529) ~[com.google.guava-guava-30.0-jre.jar:?] at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278) ~[com.google.guava-guava-30.0-jre.jar:?] at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155) ~[com.google.guava-guava-30.0-jre.jar:?] at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045) ~[com.google.guava-guava-30.0-jre.jar:?] ... 12 more Caused by: org.apache.pulsar.client.api.PulsarClientException: java.io.IOException: Bookie handle is not available - ledger=47 - operation=Failed to read entry - entry=0 at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1031) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.client.impl.ClientCnx.lambda$sendGetSchema$13(ClientCnx.java:803) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_275] at org.apache.pulsar.client.impl.ClientCnx.handleGetSchemaResponse(ClientCnx.java:736) ~[org.apache.pulsar-pulsar-client-original-2.7.0.jar:2.7.0] at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:373) ~[org.apache.pulsar-pulsar-common-2.7.0.jar:2.7.0] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) ~[io.netty-netty-codec-4.1.51.Final.jar:4.1.51.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) ~[io.netty-netty-codec-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[io.netty-netty-transport-4.1.51.Final.jar:4.1.51.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) ~[io.netty-netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar:4.1.51.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) ~[io.netty-netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar:4.1.51.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[io.netty-netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar:4.1.51.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[io.netty-netty-common-4.1.51.Final.jar:4.1.51.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty-netty-common-4.1.51.Final.jar:4.1.51.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.51.Final.jar:4.1.51.Final] ... 1 more 20:00:37.712 [dbus/tdf/histsink-vertica1-1] INFO org.apache.pulsar.functions.instance.JavaInstanceRunnable - Closing instance 20:00:37.792 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-15min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.793 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-1min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.795 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-30min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.796 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-5min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.797 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-day] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.798 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-hour] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.799 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-month] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.800 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-raw] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.800 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-det-year] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.801 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-15min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.802 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-1min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.803 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-30min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.804 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-5min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.804 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-day] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.806 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-hour] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.807 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-month] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.808 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-raw] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.808 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-link-year] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.809 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-15min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.810 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-1min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.811 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-30min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.812 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-5min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.813 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-day] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.813 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-hour] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.814 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-month] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.815 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-raw] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.816 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-point-year] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.817 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-15min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.818 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-1min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.819 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-30min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.820 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-5min] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.821 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-day] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.823 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-hour] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.824 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-month] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.825 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-raw] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.826 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [persistent://dbus/tdf/measure-route-year] [sub-tdf-histsink-vertica1] Closed consumer 20:00:37.835 [dbus/tdf/histsink-vertica1-1] INFO org.apache.pulsar.functions.instance.JavaInstanceRunnable - Unloading JAR files for function InstanceConfig(instanceId=1, functionId=5f18d3e6-530c-49a3-a128-ff01a8069b98, functionVersion=2ef762e6-ed0d-44d4-aca3-39ec09d176e0, functionDetails=tenant: "dbus" namespace: "tdf" name: "histsink-vertica1" className: "org.apache.pulsar.functions.api.utils.IdentityFunction" parallelism: 3 source { typeClassName: "org.apache.pulsar.client.api.schema.GenericRecord" subscriptionName: "sub-tdf-histsink-vertica1" inputSpecs { key: "persistent://dbus/tdf/measure-det-15min" value { } } ... componentType: SINK customRuntimeOptions: "{ \"jobNamespace\" : \"dbus\" }" , maxBufferedTuples=1024, functionAuthenticationSpec=null, port=40545, clusterName=databus, maxPendingAsyncRequests=1000) 20:00:37.836 [main] INFO org.apache.pulsar.functions.runtime.JavaInstanceStarter - RuntimeSpawner quit, shutting down JavaInstance 20:00:37.837 [main] INFO org.apache.pulsar.client.impl.PulsarClientImpl - Client closing. URL: pulsar://pulsar2:6650 20:00:37.847 [pulsar-client-io-1-7] INFO org.apache.pulsar.client.impl.ClientCnx - [id: 0x7b07901f, L:/10.0.4.61:46984 ! R:pulsar1/10.0.4.64:6650] Disconnected 20:00:37.848 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ClientCnx - [id: 0x5d65a905, L:/10.0.4.61:48236 ! R:pulsar2/10.0.4.60:6650] Disconnected 20:00:37.851 [pulsar-client-io-1-5] INFO org.apache.pulsar.client.impl.ClientCnx - [id: 0x053f4733, L:/10.0.4.61:56168 ! R:pulsar3/10.0.4.62:6650] Disconnected

To Reproduce We are producing data to all these topics with up to 4000 messages per topic every 5 minutes. After some hours (or even minutes) this issue arises.

We need to restart the sink. Sometimes they fail immediately and others they continue processing data.

We have option autoSkipNonRecoverableData enabled on all broker instances.

Expected behavior

Option autoSkipNonRecoverableData should let pulsar client feeding sink continue processing messages, even if any fail.

Ideally these "Bookie handle is not available" errors should not happen.

In any case the sink instances should be able to restart automatically.

Screenshots

Desktop (please complete the following information):

Additional context

codelipenghui commented 3 years ago

@fmiguelez The autoSkipNonRecoverableData only works for the ledger does not exist, does not work for some IO exception, this will avoid losing data in some unknown circumstances.

codelipenghui commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.

tisonkun commented 1 year ago

Closed as answered.