EnMasseProject / enmasse

EnMasse - Self-service messaging on Kubernetes and OpenShift
https://enmasseproject.github.io
Apache License 2.0
190 stars 87 forks source link

Cannot send/receive messages in standard addressspace when one qdrouter pod is deleted #2127

Open kornys opened 5 years ago

kornys commented 5 years ago

Describe the bug When one of two qdrouter pods is deleted and test immediately tries to send/receive message, client fails on timetout.

To Reproduce Steps to reproduce the behavior:

  1. Create standard address space
  2. Create topic/queue addresses (standard-small-queue, standard-small-topic, standard-large-queue, standard-large-topic)
  3. wait for addresses are ready
  4. delete one of the qdrouter pod
  5. try attach client to address immediately

Expected behavior Client should connect immediately because there is still one qdrouter pod active

Test log

testlog.zip

...
2018-12-15T17:38:34.892Z INFO  [TestUtils] Checking {name=test-queue, address=test-queue} for ready state
2018-12-15T17:38:39.893Z INFO  [TimeMeasuringSystem] End time of operation CREATE_ADDRESS-dd4cefd0 is correctly stored
2018-12-15T17:38:39.893Z INFO  [GlobalLogCollector] Collecting configmaps for namespace myproject
2018-12-15T17:38:40.965Z INFO  [AmqpClientFactory] External endpoint: messaging-9vvx8n91wl-myproject.10.37.144.48.nip.io:443, internal: messaging-9vvx8n91wl-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:41.959Z INFO  [Receiver] Receiver link 'test-queue' opened, granting credits
2018-12-15T17:38:42.536Z INFO  [Sender] Sender link 'test-queue' opened, sending messages
2018-12-15T17:38:43.556Z INFO  [AmqpClientFactory] External endpoint: messaging-9vvx8n91wl-myproject.10.37.144.48.nip.io:443, internal: messaging-9vvx8n91wl-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:44.277Z INFO  [Receiver] Receiver link 'test-topic' opened, granting credits
2018-12-15T17:38:44.803Z INFO  [Sender] Sender link 'test-topic' opened, sending messages
2018-12-15T17:38:45.847Z INFO  [AmqpClientFactory] External endpoint: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443, internal: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:46.030Z INFO  [Receiver] Receiver link 'test-queue' opened, granting credits
2018-12-15T17:38:46.185Z INFO  [Sender] Sender link 'test-queue' opened, sending messages
2018-12-15T17:38:47.534Z INFO  [AmqpClientFactory] External endpoint: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443, internal: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:47.718Z INFO  [Receiver] Receiver link 'test-topic' opened, granting credits
2018-12-15T17:38:47.877Z INFO  [Sender] Sender link 'test-topic' opened, sending messages
2018-12-15T17:38:48.883Z INFO  [AmqpClientFactory] External endpoint: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443, internal: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:49.038Z INFO  [Receiver] Receiver link 'test-queue-sharded' opened, granting credits
2018-12-15T17:38:49.192Z INFO  [Sender] Sender link 'test-queue-sharded' opened, sending messages
2018-12-15T17:38:50.535Z INFO  [AmqpClientFactory] External endpoint: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443, internal: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:50.722Z INFO  [Receiver] Receiver link 'test-topic-sharded' opened, granting credits
2018-12-15T17:38:50.885Z INFO  [Sender] Sender link 'test-topic-sharded' opened, sending messages
2018-12-15T17:38:51.892Z INFO  [AmqpClientFactory] External endpoint: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443, internal: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:52.041Z INFO  [Receiver] Receiver link 'test-anycast' opened, granting credits
2018-12-15T17:38:52.195Z INFO  [Sender] Sender link 'test-anycast' opened, sending messages
2018-12-15T17:38:53.198Z INFO  [AmqpClientFactory] External endpoint: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443, internal: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:53.352Z INFO  [Receiver] Receiver link 'test-multicast' opened, granting credits
2018-12-15T17:38:53.526Z INFO  [Sender] Sender link 'test-multicast' opened, sending messages
2018-12-15T17:38:53.528Z INFO  [CommonTest] ------------------------------------------------------------
2018-12-15T17:38:53.528Z INFO  [CommonTest] ------------------- Start with restating -------------------
2018-12-15T17:38:53.528Z INFO  [CommonTest] ------------------------------------------------------------
2018-12-15T17:38:53.682Z INFO  [Kubernetes] Pod qdrouterd-y0vy94skua-0 removed
2018-12-15T17:38:53.682Z INFO  [CommonTest] Check if system works
2018-12-15T17:38:54.683Z INFO  [AmqpClientFactory] External endpoint: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443, internal: messaging-y0vy94skua-myproject.10.37.144.48.nip.io:443
2018-12-15T17:38:54.860Z INFO  [Receiver] Receiver link 'test-queue' opened, granting credits
2018-12-15T17:38:55.022Z INFO  [Sender] Sender link 'test-queue' opened, sending messages
2018-12-15T17:39:55.030Z WARN  [TestBase] Remove address spaces in tear down - SKIPPED!
2018-12-15T17:39:55.031Z INFO  [TimeMeasuringSystem] End time of operation TEST_EXECUTION is correctly stored
2018-12-15T17:39:55.031Z ERROR [ITestSeparator] Caught exception {}
java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.enmasse.systemtest.bases.TestBase.canConnectWithAmqpAddress(TestBase.java:466)
    at io.enmasse.systemtest.bases.TestBase.assertCanConnect(TestBase.java:442)
    at io.enmasse.systemtest.common.CommonTest.assertSystemWorks(CommonTest.java:114)
    at io.enmasse.systemtest.common.CommonTest.testRestartComponents(CommonTest.java:94)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:436)
    at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:115)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:170)
    at org.junit.jupiter.engine.execution.ThrowableCollector.execute(ThrowableCollector.java:40)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:166)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:113)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:58)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$3(HierarchicalTestExecutor.java:112)
    at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.executeRecursively(HierarchicalTestExecutor.java:108)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.execute(HierarchicalTestExecutor.java:79)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$2(HierarchicalTestExecutor.java:120)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$3(HierarchicalTestExecutor.java:120)
    at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.executeRecursively(HierarchicalTestExecutor.java:108)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.execute(HierarchicalTestExecutor.java:79)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$2(HierarchicalTestExecutor.java:120)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.lambda$executeRecursively$3(HierarchicalTestExecutor.java:120)
    at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.executeRecursively(HierarchicalTestExecutor.java:108)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor$NodeExecutor.execute(HierarchicalTestExecutor.java:79)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:55)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:43)
    at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:170)
    at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:154)
    at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:90)
    at org.junit.platform.surefire.provider.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:132)
    at org.junit.platform.surefire.provider.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:111)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:290)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:242)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:121)
2018-12-15T17:39:55.035Z INFO  [ITestSeparator] io.enmasse.systemtest.common.CommonTest.testRestartComponents-FINISHED
2018-12-15T17:39:55.035Z INFO  [ITestSeparator] ####################################################################################################
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 395.303 sec <<< FAILURE! - in io.enmasse.systemtest.common.CommonTest
testRestartComponents  Time elapsed: 394.588 sec  <<< ERROR!
java.util.concurrent.TimeoutException
    at io.enmasse.systemtest.common.CommonTest.assertSystemWorks(CommonTest.java:114)
    at io.enmasse.systemtest.common.CommonTest.testRestartComponents(CommonTest.java:94)

2018-12-15T17:39:55.049Z WARN  [ExecutionListener] Remove address spaces when test run finished - SKIPPED!
2018-12-15T17:39:55.051Z INFO  [TimeMeasuringSystem] ================================================
2018-12-15T17:39:55.051Z INFO  [TimeMeasuringSystem] ================================================
2018-12-15T17:39:55.051Z INFO  [TimeMeasuringSystem] io.enmasse.systemtest.common.CommonTest
2018-12-15T17:39:55.051Z INFO  [TimeMeasuringSystem] ---------------------------------------------
2018-12-15T17:39:55.051Z INFO  [TimeMeasuringSystem] testRestartComponents
2018-12-15T17:39:55.052Z INFO  [TimeMeasuringSystem] Operation id: TEST_EXECUTION duration: 00:06:34,560 started: 18:33:20,470 ended: 18:39:55,030
2018-12-15T17:39:55.052Z INFO  [TimeMeasuringSystem] Operation id: CREATE_ADDRESS_SPACE-f5e9a268 duration: 00:02:24,393 started: 18:33:21,974 ended: 18:35:46,367
2018-12-15T17:39:55.052Z INFO  [TimeMeasuringSystem] Operation id: CREATE_USER-7e82bc30 duration: 00:00:01,353 started: 18:35:49,308 ended: 18:35:50,661
2018-12-15T17:39:55.052Z INFO  [TimeMeasuringSystem] Operation id: CREATE_USER-58e376e6 duration: 00:00:01,279 started: 18:35:52,983 ended: 18:35:54,262
2018-12-15T17:39:55.052Z INFO  [TimeMeasuringSystem] Operation id: CREATE_ADDRESS-bc3536a5 duration: 00:00:52,284 started: 18:35:58,571 ended: 18:36:50,855
2018-12-15T17:39:55.052Z INFO  [TimeMeasuringSystem] Operation id: CREATE_ADDRESS-dd4cefd0 duration: 00:01:45,101 started: 18:36:54,792 ended: 18:38:39,893

Results :

Tests in error:
  CommonTest.testRestartComponents:94->assertSystemWorks:114->TestBase.assertCanConnect:442->TestBase.canConnectWithAmqpAddress:466 ? Timeout

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
kornys commented 4 years ago

@lulf I think this a still actual

lulf commented 4 years ago

The client should retry more than 1 time, as there will always be a small time window where router is shutting down and load balancer has not detected it yet.

kornys commented 4 years ago

Ok make sense, I will change test