caroljmcdonald / mapr-streams-sparkstreaming-hbase

24 stars 23 forks source link

java.lang.AssertionError: assertion failed: Failed to get records for /user/user01/pump:sensor #1

Open jamesrgrinter opened 8 years ago

jamesrgrinter commented 8 years ago

After updating the version of mapr-spark in the MapR 5.1.0 GA Sandbox from v1.5.2 to v1.6.1.201607242143, the Scala-based Spark-streaming consumers no longer run.

I've also tried applying the MapR 5.1.0 June 2016 Patch, but that hasn't made a difference.

Here's what I get:

$ spark-submit --class solution.SensorStreamConsumer --master local[2] ms-sparkstreaming-1.0.jar
16/08/03 13:42:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
start streaming
[Stage 1:>                                                          (0 + 2) / 3]16/08/03 13:42:48 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.AssertionError: assertion failed: Failed to get records for /user/user01/pump:sensor)
|0 1026779 after polling for 1000
    at scala.Predef$.assert(Predef.scala:179)
    at org.apache.spark.streaming.kafka.v09.KafkaRDD$KafkaRDDIterator.fetchBatch(KafkaRDD.scala:203)
    at org.apache.spark.streaming.kafka.v09.KafkaRDD$KafkaRDDIterator.hasNext(KafkaRDD.scala:173)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
...
16/08/03 13:42:48 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
16/08/03 13:42:48 ERROR JobScheduler: Error running job streaming job 1470228164000 ms.0

I don't understand why an assertion was added in MapR's 1.6.1 version of KafkaRDD.scala.fetchBatch (seems to have come with a commit for MAPR-23854)- with my limited Kafka/MapR streams knowledge it seems plausible for a read to return 0 bytes.

Tracy6465 commented 7 years ago

You have solved this problem?

caroljmcdonald commented 7 years ago

the work around is to increase the polling time out. I agree that it should not be an error

On Thu, Jan 19, 2017 at 2:49 AM, Chris AuYeung notifications@github.com wrote:

You have solved this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/caroljmcdonald/mapr-streams-sparkstreaming-hbase/issues/1#issuecomment-273704215, or mute the thread https://github.com/notifications/unsubscribe-auth/ADMEZJ_mflfQmkAvtqhthSFXNCiOXltoks5rTxWDgaJpZM4Jbt7e .