Buffer isFull RxJava Exception

satybald commented 7 years ago

During using 0.13 version of the library we found that stream observer got stopped due to an exception. This happened on staging with eventlog subscription that has small number of events ~1000

nakadi.shadow.io.reactivex.exceptions.MissingBackpressureException: Buffer is full
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableOnBackpressureBuffer$BackpressureBufferSubscriber.onNext(FlowableOnBackpressureBuffer.java:99)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableDoOnEach$DoOnEachSubscriber.onNext(FlowableDoOnEach.java:91)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.fastPath(FlowableFromIterable.java:181)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:123)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.subscribers.BasicFuseableSubscriber.request(BasicFuseableSubscriber.java:153)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableOnBackpressureBuffer$BackpressureBufferSubscriber.onSubscribe(FlowableOnBackpressureBuffer.java:91)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.subscribers.BasicFuseableSubscriber.onSubscribe(BasicFuseableSubscriber.java:67)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableFromIterable.subscribe(FlowableFromIterable.java:69)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableFromIterable.subscribeActual(FlowableFromIterable.java:47)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.Flowable.subscribe(Flowable.java:13013)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableDoOnEach.subscribeActual(FlowableDoOnEach.java:49)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.Flowable.subscribe(Flowable.java:13013)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableOnBackpressureBuffer.subscribeActual(FlowableOnBackpressureBuffer.java:46)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.Flowable.subscribe(Flowable.java:13013)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableMap.subscribeActual(FlowableMap.java:38)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.Flowable.subscribe(Flowable.java:13013)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.Flowable.subscribe(Flowable.java:12960)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableUsing.subscribeActual(FlowableUsing.java:73)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.Flowable.subscribe(Flowable.java:13013)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.Flowable.subscribe(Flowable.java:12960)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.operators.flowable.FlowableSubscribeOn$SubscribeOnSubscriber.run(FlowableSubscribeOn.java:82)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.schedulers.ExecutorScheduler$ExecutorWorker$BooleanRunnable.run(ExecutorScheduler.java:260)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at nakadi.shadow.io.reactivex.internal.schedulers.ExecutorScheduler$ExecutorWorker.run(ExecutorScheduler.java:225)
07:39:14.293 Oct 10 05:39:13 docker/ca5aa5dcdb70[860]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142

dehora commented 7 years ago

This ultimately means the observer part of the processor isn't keeping up. 0.9.15.b1 is a beta release that tries to make this a bit easier by not configuring a custom event batch buffer size at all. That's based on multiple test runs with various buffer configurations and server params (seeing less retries and disconnect cycles over the period of an hour or more). It's counter-intuitive and the comment in https://github.com/dehora/nakadi-java/pull/299 is worth reading, but the short version is smaller underlying batch buffers seem to be more stable overall.

For a real world usecase maxUncommittedEvents, batchLimit and (maybe) batchFlushTimeout seem to be also worth tuning. Low maxUncommittedEvents especially seem to result in unstable client/server cycles, but a very high number can mean a consuming a backlog of events that won't be committable to the checkpointer because the session they were fetched under is expired.

The consumer protocol is quite complicated, I think too complicated. But an underlying thing seems to be to not consume large data sets from Nakadi unless the observer is very fast.

dehora commented 7 years ago

Closing for now. The default since 0.9.15 and up seems to be better and beyond that, it requires tuning based on factors like the observer's ability, rate of events, consumer configuration and hidden nakadi defaults, which are outside the client's ability to handle.

satybald commented 7 years ago

agree, after 0.15 release we didn't notice such exception for a while. thanks @dehora!

dehora / nakadi-java

Buffer isFull RxJava Exception #294