Open alechenninger opened 7 years ago
@dcrissman @snehagunta
So this has to do with the timeout handling we recently added. Basically, currently we don't share a bulk requester between multiple threads, and wait for all futures individually. However, when one times out (a new feature), we move on to the next future, which will be in the same batch, but at that point the completer is running in another thread, so the next future's .get will be called and won't be completed, triggering the bug.
The fix is complicated. A workaround is to set the timeout really high for messages. I'd like to talk about fixing for real, but I will be out next week.
@snehagunta We should also look at why those messages are timing out
Seen exception:
2017-04-26 14:26:36,685 [Camel (camel-1) thread #0 - SjmsBatchConsumer] ERROR [org.esbtools.eventhandler.AsyncBatchMessageProcessorRoute] Failed to process message: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Future attempted to lazily trigger completion, but completer did not actually complete the future. Check the provided completer function for correctness.
at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyTransformableFuture.get(BulkLightblueRequester.java:408)
at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyTransformableFuture.get(BulkLightblueRequester.java:383)
at org.esbtools.eventhandler.lightblue.client.BulkLightblueRequester$LazyRequestTransformableFuture.get(BulkLightblueRequester.java:548)
at org.esbtools.eventhandler.NestedTransformableFutureIgnoringReturn.get(NestedTransformableFutureIgnoringReturn.java:64)
at org.esbtools.eventhandler.NestedTransformableFutureIgnoringReturn.get(NestedTransformableFutureIgnoringReturn.java:25)
at org.esbtools.eventhandler.AsyncBatchMessageProcessorRoute.lambda$configure$0(AsyncBatchMessageProcessorRoute.java:123)
at org.apache.camel.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:63)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:121)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:109)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:87)
at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.processBatch(SjmsBatchConsumer.java:427)
at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.access$1900(SjmsBatchConsumer.java:215)
at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.completionBatch(SjmsBatchConsumer.java:367)
at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.consumeBatchesOnLoop(SjmsBatchConsumer.java:303)
at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop$BatchConsumptionTask.access$400(SjmsBatchConsumer.java:269)
at org.apache.camel.component.sjms.batch.SjmsBatchConsumer$BatchConsumptionLoop.run(SjmsBatchConsumer.java:236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Future attempted to lazily trigger completion, but completer did not actually complete the future. Check the provided completer function for correctness.
... 24 more
I think to fix we need to:
This means however, that a timed-out future may fail everything in that batch, since the thread making the requests is interrupted, therefore throws an exception, therefore fails all of the futures in that batch. However, we have retries and idempotent messages for this reason already.
Food for thought: hystrix already has request batching built in. We could consider instead making a HystrixBulkLightblueRequester that made each request using hystrix, and let hystrix handle the thread pooling and batching, as long as we made commands batchable. This would probably be more maintainable, safer, and easier to implement. I would strongly consider that route.
I think this can happen?
Seems like an obvious miss, so perhaps this is not right. Will look later.