apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.29k stars 3.59k forks source link

Flaky-test: ExtensibleLoadManagerTest.testIsolationPolicy #20608

Open lhotari opened 1 year ago

lhotari commented 1 year ago

Search before asking

Example failure

https://github.com/apache/pulsar/actions/runs/5309497330/jobs/9610705022?pr=20603#step:12:23713

Exception stacktrace

  Error:  Tests run: 20, Failures: 1, Errors: 0, Skipped: 16, Time elapsed: 545.625 s <<< FAILURE! - in TestSuite
  Error:  org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy  Time elapsed: 29.596 s  <<< FAILURE!
  java.lang.AssertionError: expected [true] but found [false]
    at org.testng.Assert.fail(Assert.java:110)
    at org.testng.Assert.failNotEquals(Assert.java:1577)
    at org.testng.Assert.assertTrue(Assert.java:56)
    at org.testng.Assert.assertTrue(Assert.java:66)
    at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy(ExtensibleLoadManagerTest.java:349)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
    at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
    at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
    at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

Are you willing to submit a PR?

lhotari commented 1 year ago

Error in logs:

  2023-06-19T08:55:18,683 - ERROR - [TestNG-method=testIsolationPolicy-1:ExtensibleLoadManagerTest@348] - Failed to lookup topic: 
  org.apache.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: 
   --- An unexpected error occurred in the server ---

  Message: Failed to select the new owner broker for bundle: my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000

  Stacktrace:

  java.lang.IllegalStateException: Failed to select the new owner broker for bundle: my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getOwnerAsync$8(ExtensibleLoadManagerImpl.java:400)
    at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
    at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getOwnerAsync$9(ExtensibleLoadManagerImpl.java:394)
    at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
    at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.getOwnerAsync(ExtensibleLoadManagerImpl.java:384)
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$assign$7(ExtensibleLoadManagerImpl.java:376)
    at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.put(ConcurrentOpenHashMap.java:409)
    at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.computeIfAbsent(ConcurrentOpenHashMap.java:243)
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.dedupeLookupRequest(ExtensibleLoadManagerImpl.java:457)
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.assign(ExtensibleLoadManagerImpl.java:370)
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerWrapper.findBrokerServiceUrl(ExtensibleLoadManagerWrapper.java:66)
    at org.apache.pulsar.broker.namespace.NamespaceService.lambda$getBrokerServiceUrlAsync$0(NamespaceService.java:198)
    at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
    at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
    at org.apache.pulsar.broker.namespace.NamespaceService.lambda$getBrokerServiceUrlAsync$1(NamespaceService.java:191)
    at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
    at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
    at org.apache.pulsar.broker.namespace.NamespaceService.getBrokerServiceUrlAsync(NamespaceService.java:189)
    at org.apache.pulsar.broker.lookup.TopicLookupBase.lambda$internalLookupTopicAsync$6(TopicLookupBase.java:85)
    at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
    at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
    at org.apache.pulsar.broker.lookup.TopicLookupBase.internalLookupTopicAsync(TopicLookupBase.java:80)
    at org.apache.pulsar.broker.lookup.v2.TopicLookup.lookupTopicAsync(TopicLookup.java:69)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
    at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
    at org.eclipse.jetty.servlets.QoSFilter.doFilter(QoSFilter.java:202)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
    at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:833)

    at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.finishUpdate(AsyncHttpClientHandler.java:241) ~[async-http-client-2.12.1.jar:?]
    at org.asynchttpclient.netty.handler.HttpHandler.handleChunk(HttpHandler.java:114) ~[async-http-client-2.12.1.jar:?]
    at org.asynchttpclient.netty.handler.HttpHandler.handleRead(HttpHandler.java:143) ~[async-http-client-2.12.1.jar:?]
    at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.channelRead(AsyncHttpClientHandler.java:78) ~[async-http-client-2.12.1.jar:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.93.Final.jar:4.1.93.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.93.Final.jar:4.1.93.Final]
    ... 1 more
  java.lang.AssertionError: expected [true] but found [false]
    at org.testng.Assert.fail(Assert.java:110)
    at org.testng.Assert.failNotEquals(Assert.java:1577)
    at org.testng.Assert.assertTrue(Assert.java:56)
    at org.testng.Assert.assertTrue(Assert.java:66)
    at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy(ExtensibleLoadManagerTest.java:349)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
    at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
    at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
    at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
lhotari commented 1 year ago

This test now fails in this location:

  2023-06-28T16:21:50,038 - INFO  - [main:TestRetrySupport@53] - Previous test run has failed before ExtensibleLoadManagerTest.testIsolationPolicy, failedSetupNumber=1. Running cleanup and setup.
    at org.testng.Assert.fail(Assert.java:110)
    at org.testng.Assert.fail(Assert.java:115)
    at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy(ExtensibleLoadManagerTest.java:351)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
    at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
    at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
    at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

on this line: https://github.com/apache/pulsar/blob/e5606f28b89511e23c975b5266fc80168518a978/tests/integration/src/test/java/org/apache/pulsar/tests/integration/loadbalance/ExtensibleLoadManagerTest.java#L351

One possible reason for this failure is the problem that metadata changes are eventually consistent in Pulsar. The lack of strong consistency causes flakiness which is also a production code problem. A good example is https://github.com/apache/pulsar/issues/12555#issuecomment-955748744 . The metadata consistency problems are also covered in the blog post https://codingthestreams.com/pulsar/2022/10/18/view-of-pulsar-metadata-store.html .

github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.

Technoboy- commented 11 months ago

https://github.com/apache/pulsar/actions/runs/7228625557/job/19714020084?pr=21732

lhotari commented 11 months ago

Flaky-test: org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy Number of failures: 2

org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest is flaky. The testIsolationPolicy test method fails sporadically.

java.lang.AssertionError:

Expecting actual:
  "
 --- An unexpected error occurred in the server ---

Message: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.

Stacktrace:

java.lang.IllegalStateException: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getBrokerLookupData$12(ExtensibleLoadManagerImpl.java:529)
    at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2023-12-17T12:45:31.9552882Z

Full exception stacktrace
java.lang.AssertionError:

Expecting actual:
  "
 --- An unexpected error occurred in the server ---

Message: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.

Stacktrace:

java.lang.IllegalStateException: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getBrokerLookupData$12(ExtensibleLoadManagerImpl.java:529)
    at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
    at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:269)
    at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$5(ZKMetadataStore.java:219)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:840)
"
to contain at least one of the following elements:
java.lang.AssertionError:

Expecting actual:
  "
 --- An unexpected error occurred in the server ---

Message: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.

Stacktrace:

java.lang.IllegalStateException: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getBrokerLookupData$12(ExtensibleLoadManagerImpl.java:529)
    at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2023-12-16T06:33:50.6223790Z

Full exception stacktrace
java.lang.AssertionError:

Expecting actual:
  "
 --- An unexpected error occurred in the server ---

Message: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.

Stacktrace:

java.lang.IllegalStateException: Failed to lookup broker:pulsar-broker-1:8080 for bundle:my-tenant/my-isolation-policy-anti-affinity-enabled/0x40000000_0x80000000, the broker has not been registered.
    at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$getBrokerLookupData$12(ExtensibleLoadManagerImpl.java:529)
    at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
    at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:269)
    at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$5(ZKMetadataStore.java:219)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:1583)
"
to contain at least one of the following elements:
lhotari commented 11 months ago

uploaded logs to https://gist.githubusercontent.com/lhotari/4ac3082f8ab4806a138095599bb6de7d/raw/e8604ae47515c97c0ff25f6849051f515d6c73b3/ExtensibleLoadManagerTest_testIsolationPolicy_logs.txt

The log file is from the build's Integration-LOADBALANCE-surefire-reports.zip file and extracted with this command:

xmlstarlet sel -t -m '//testcase[@name="testIsolationPolicy"]' -v @classname -o '.' -v @name -n -v failure -n -v system-out -n tests/integration/target/surefire-reports/TEST-TestSuite.xml > ExtensibleLoadManagerTest_testIsolationPolicy_logs.txt

(brew install xmlstarlet to get xmlstarlet)

lhotari commented 11 months ago

Error logs contains a lot of errors like this:

2023-12-19T12:49:41,746+0000 [pulsar-load-manager-1-1] ERROR org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl - The broker:pulsar-broker-1:8080 failed to change the role. Retrying 5 th ...
org.apache.pulsar.broker.loadbalance.extensions.store.LoadDataStoreException: org.apache.pulsar.client.api.PulsarClientException$TopicDoesNotExistException: {"errorMsg":"Topic non-persistent://pulsar/system/loadbalancer-top-bundles-load-data does not exist","reqId":4571777132239267974, "remote":"pulsar-broker-1/172.18.0.7:6650", "local":"/172.18.0.7:46828"}
        at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.startTableView(TableViewLoadDataStoreImpl.java:109) ~[org.apache.pulsar-pulsar-broker-3.2.0-SNAPSHOT.jar:3.2.0-SNAPS

And it seems that it never recovers. Is this expected?

lhotari commented 11 months ago

I wonder if this is related to https://github.com/apache/pulsar-client-cpp/issues/86 . There have been issues in the past where a topic that has been created on one broker won't be immediately visible on another broker.

heesung-sn commented 11 months ago

Raised a fix : https://github.com/apache/pulsar/pull/21764

lhotari commented 9 months ago

Recent failure:

  Error:  Tests run: 20, Failures: 1, Errors: 0, Skipped: 16, Time elapsed: 595.011 s <<< FAILURE! - in TestSuite
  Error:  org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy  Time elapsed: 47.812 s  <<< FAILURE!
  org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest expected [2] but found [3] within 30 seconds.
    at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
    at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
    at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
    at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
    at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)
    at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.testIsolationPolicy(ExtensibleLoadManagerTest.java:374)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
    at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
    at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
    at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
  Caused by: java.lang.AssertionError: expected [2] but found [3]
    at org.testng.Assert.fail(Assert.java:110)
    at org.testng.Assert.failNotEquals(Assert.java:1577)
    at org.testng.Assert.assertEqualsImpl(Assert.java:149)
    at org.testng.Assert.assertEquals(Assert.java:131)
    at org.testng.Assert.assertEquals(Assert.java:1418)
    at org.testng.Assert.assertEquals(Assert.java:1382)
    at org.testng.Assert.assertEquals(Assert.java:1428)
    at org.apache.pulsar.tests.integration.loadbalance.ExtensibleLoadManagerTest.lambda$testIsolationPolicy$7(ExtensibleLoadManagerTest.java:377)
    at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
    at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:248)
    at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:235)
    ... 4 more

in https://github.com/apache/pulsar/actions/runs/8046911413/job/21976191499?pr=22122#step:12:22339