awslabs / amazon-kinesis-producer

Amazon Kinesis Producer Library
Apache License 2.0
401 stars 331 forks source link

Occasionally Get a BUNCH this Opaque Error #46

Open rdifalco opened 8 years ago

rdifalco commented 8 years ago

Every once in a while I get a bunch of these at once (like hundreds). Unfortunately they are pretty opaque. I don't know what it means nor what I am supposed to do in response to it. Is retry possible in this case? Does a FATAL error mean I need to restart the native lib? So yeah, what does it mean, why did it happen, and what should I do.

java.lang.RuntimeException: EOF reached during read
    at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:498)
    at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:480)
    at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:476)
    at com.amazonaws.services.kinesis.producer.Daemon.readSome(Daemon.java:519)
    at com.amazonaws.services.kinesis.producer.Daemon.receiveMessage(Daemon.java:241)
    at com.amazonaws.services.kinesis.producer.Daemon.access$500(Daemon.java:61) 
    at com.amazonaws.services.kinesis.producer.Daemon$3.run(Daemon.java:296) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_66]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_66]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]
samuelgmartinez commented 8 years ago

We seen a few of them and usually this happens when the native process dies, so the pipes are closed. That's the reason why the KPL is complaining.

The problem is why the native lib process died? We struggled to figure that out most of the times...

rdifalco commented 8 years ago

That's our experience too. I have to say it doesn't appear that anyone from AWS works on KPL as a primary day to day project and that worries me. @samuelgmartinez, do you find that the native process is restarted in a timely manner or does this require manual intervention. I'd like to have a good way to handle this, for example, know that it is restarting and pause production until it is back up.

samuelgmartinez commented 8 years ago

@rdifalco we just watch the kinesis native process and if it dies we restart the consumer (trying to do a graceful restart).

I don't know the details about it, I'm not managing that directly (is a devops/ops thing now :D) but I think they are doing it using something from systemd.

jeremysears commented 7 years ago

I see this as well w/ version 0.12.1.

pfifer commented 7 years ago

@jeremysears When you get the error message is there any output logged from the native process? It's still possible that the native process can die, but it should log some information about why it exited.

amiteswar commented 6 years ago

@pfifer I am also using KPL(0.12.9) in AWS lambda function coded using java8 and getting this error. Here is a full stack trace of the exception I see. I am not seeing any extra information other than Caused by: java.lang.RuntimeException: EOF reached during read

01:34:59 java.util.concurrent.ExecutionException: java.lang.RuntimeException: EOF reached during read
         at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[task/:?]
         at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[task/:?]
         at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[task/:?]
         at com.pipeline.service.KinesisRecordsWriter.putRecords(KinesisRecordsWriter.java:84) ~[task/:?]
         at com.pipeline.service.PipelineService.processPayload(PipelineService.java:155) [task/:?]
         at com.pipeline.function.UnbundleHandler.handleRequest(UnbundleHandler.java:57) [task/:?]
         at com.pipeline.function.UnbundleHandler.handleRequest(UnbundleHandler.java:27) [task/:?]
         at lambdainternal.EventHandlerLoader$PojoHandlerAsStreamHandler.handleRequest(EventHandlerLoader.java:178) [LambdaSandboxJava-1.0.jar:?]
         at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:888) [LambdaSandboxJava-1.0.jar:?]
         at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:286) [LambdaSandboxJava-1.0.jar:?]
         at lambdainternal.AWSLambda.<clinit>(AWSLambda.java:64) [LambdaSandboxJava-1.0.jar:?]
         at java.lang.Class.forName0(Native Method) ~[?:1.8.0_141]
         at java.lang.Class.forName(Class.java:348) [?:1.8.0_141]
         at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:94) [LambdaJavaRTEntry-1.0.jar:?]
         Caused by: java.lang.RuntimeException: EOF reached during read
         at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:521) ~[task/:?]
         at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:497) ~[task/:?]
         at com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:493) ~[task/:?]
         at com.amazonaws.services.kinesis.producer.Daemon.readSome(Daemon.java:542) ~[task/:?]
         at com.amazonaws.services.kinesis.producer.Daemon.receiveMessage(Daemon.java:246) ~[task/:?]
         at com.amazonaws.services.kinesis.producer.Daemon.access$500(Daemon.java:63) ~[task/:?]
         at com.amazonaws.services.kinesis.producer.Daemon$3.run(Daemon.java:301) ~[task/:?]
         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_141]
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_141]
         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_141]