Open rdifalco opened 8 years ago
We seen a few of them and usually this happens when the native process dies, so the pipes are closed. That's the reason why the KPL is complaining.
The problem is why the native lib process died? We struggled to figure that out most of the times...
That's our experience too. I have to say it doesn't appear that anyone from AWS works on KPL as a primary day to day project and that worries me. @samuelgmartinez, do you find that the native process is restarted in a timely manner or does this require manual intervention. I'd like to have a good way to handle this, for example, know that it is restarting and pause production until it is back up.
@rdifalco we just watch the kinesis native process and if it dies we restart the consumer (trying to do a graceful restart).
I don't know the details about it, I'm not managing that directly (is a devops/ops thing now :D) but I think they are doing it using something from systemd.
I see this as well w/ version 0.12.1.
@jeremysears When you get the error message is there any output logged from the native process? It's still possible that the native process can die, but it should log some information about why it exited.
@pfifer I am also using KPL(0.12.9) in AWS lambda function coded using java8 and getting this error. Here is a full stack trace of the exception I see. I am not seeing any extra information other than Caused by: java.lang.RuntimeException: EOF reached during read
01:34:59 java.util.concurrent.ExecutionException: java.lang.RuntimeException: EOF reached during read
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[task/:?]
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[task/:?]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[task/:?]
at com.pipeline.service.KinesisRecordsWriter.putRecords(KinesisRecordsWriter.java:84) ~[task/:?]
at com.pipeline.service.PipelineService.processPayload(PipelineService.java:155) [task/:?]
at com.pipeline.function.UnbundleHandler.handleRequest(UnbundleHandler.java:57) [task/:?]
at com.pipeline.function.UnbundleHandler.handleRequest(UnbundleHandler.java:27) [task/:?]
at lambdainternal.EventHandlerLoader$PojoHandlerAsStreamHandler.handleRequest(EventHandlerLoader.java:178) [LambdaSandboxJava-1.0.jar:?]
at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:888) [LambdaSandboxJava-1.0.jar:?]
at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:286) [LambdaSandboxJava-1.0.jar:?]
at lambdainternal.AWSLambda.<clinit>(AWSLambda.java:64) [LambdaSandboxJava-1.0.jar:?]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_141]
at java.lang.Class.forName(Class.java:348) [?:1.8.0_141]
at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:94) [LambdaJavaRTEntry-1.0.jar:?]
Caused by: java.lang.RuntimeException: EOF reached during read
at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:521) ~[task/:?]
at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:497) ~[task/:?]
at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:493) ~[task/:?]
at com.amazonaws.services.kinesis.producer.Daemon.readSome(Daemon.java:542) ~[task/:?]
at com.amazonaws.services.kinesis.producer.Daemon.receiveMessage(Daemon.java:246) ~[task/:?]
at com.amazonaws.services.kinesis.producer.Daemon.access$500(Daemon.java:63) ~[task/:?]
at com.amazonaws.services.kinesis.producer.Daemon$3.run(Daemon.java:301) ~[task/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_141]
Every once in a while I get a bunch of these at once (like hundreds). Unfortunately they are pretty opaque. I don't know what it means nor what I am supposed to do in response to it. Is retry possible in this case? Does a FATAL error mean I need to restart the native lib? So yeah, what does it mean, why did it happen, and what should I do.