Huawei-Spark / Spark-SQL-on-HBase

Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
Apache License 2.0
321 stars 164 forks source link

Error on executing 'Select * from tablename' #22

Closed rkiyer999 closed 8 years ago

rkiyer999 commented 8 years ago

I am getting error index out of bound when i execute 'select * from table' Please find below the details :

Hbase Table: describe 'sales' Table sales is ENABLED sales COLUMN FAMILIES DESCRIPTION {NAME => 'sales_des', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.1240 seconds

scan 'sales' ROW COLUMN+CELL 0 column=sales_des:product, timestamp=1444305686288, value=pr0 0 column=sales_des:quantity, timestamp=1444311988162, value=0 0 column=sales_des:region, timestamp=1444305702221, value=reg0 0 column=sales_des:sales, timestamp=1444312378336, value=0 0 column=sales_des:tranid, timestamp=1444302264948, value=0 1 row(s) in 0.4380 seconds

Hbase Spark Sql : CREATE TABLE sales(tranid INTEGER, product STRING, region STRING, sales INTEGER, quantity INTEGER, PRIMARY KEY (tranid)) MAPPED BY (sales, COLS=[product=sales_des.product, region=sales_des.region, sales=sales_des.sales, quantity=sales_des.quantity]);

Error : select * from sales; 15/10/08 15:15:35 INFO hbase.HBaseSQLCliDriver: Processing select * from sales 15/10/08 15:15:35 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=sandbox.hortonworks.com:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a713416, quorum=sandbox.hortonworks.com:2181, baseZNode=/hbase-unsecure 15/10/08 15:15:35 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a713416 connecting to ZooKeeper ensemble=sandbox.hortonworks.com:2181 15/10/08 15:15:35 INFO zookeeper.ClientCnxn: Opening socket connection to server sandbox.hortonworks.com/10.0.2.15:2181. Will not attempt to authenticate using SASL (unknown error) 15/10/08 15:15:35 INFO zookeeper.ClientCnxn: Socket connection established to sandbox.hortonworks.com/10.0.2.15:2181, initiating session 15/10/08 15:15:35 INFO zookeeper.ClientCnxn: Session establishment complete on server sandbox.hortonworks.com/10.0.2.15:2181, sessionid = 0x15046c4e1230031, negotiated timeout = 40000 15/10/08 15:15:35 INFO zookeeper.ZooKeeper: Session: 0x15046c4e1230031 closed 15/10/08 15:15:35 INFO zookeeper.ClientCnxn: EventThread shut down 15/10/08 15:15:35 INFO hbase.HBaseRelation: Number of HBase regions for table sales: 1 15/10/08 15:15:35 INFO spark.SparkContext: Starting job: main at NativeMethodAccessorImpl.java:-2 15/10/08 15:15:35 INFO scheduler.DAGScheduler: Got job 6 (main at NativeMethodAccessorImpl.java:-2) with 1 output partitions (allowLocal=false) 15/10/08 15:15:35 INFO scheduler.DAGScheduler: Final stage: ResultStage 6(main at NativeMethodAccessorImpl.java:-2) 15/10/08 15:15:35 INFO scheduler.DAGScheduler: Parents of final stage: List() 15/10/08 15:15:35 INFO scheduler.DAGScheduler: Missing parents: List() 15/10/08 15:15:35 INFO scheduler.DAGScheduler: Submitting ResultStage 6 (MapPartitionsRDD[13] at main at NativeMethodAccessorImpl.java:-2), which has no missing parents 15/10/08 15:15:35 INFO storage.MemoryStore: ensureFreeSpace(18176) called with curMem=2931, maxMem=278302556 15/10/08 15:15:35 INFO storage.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 17.8 KB, free 265.4 MB) 15/10/08 15:15:35 INFO storage.MemoryStore: ensureFreeSpace(16520) called with curMem=21107, maxMem=278302556 15/10/08 15:15:36 INFO storage.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 16.1 KB, free 265.4 MB) 15/10/08 15:15:36 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on localhost:60580 (size: 16.1 KB, free: 265.4 MB) 15/10/08 15:15:36 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:874 15/10/08 15:15:36 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 6 (MapPartitionsRDD[13] at main at NativeMethodAccessorImpl.java:-2) 15/10/08 15:15:36 INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 1 tasks 15/10/08 15:15:36 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 6, localhost, ANY, 1702 bytes) 15/10/08 15:15:36 INFO executor.Executor: Running task 0.0 in stage 6.0 (TID 6) 15/10/08 15:15:36 INFO hbase.HBasePartition: None 15/10/08 15:15:36 ERROR executor.Executor: Exception in task 0.0 in stage 6.0 (TID 6) java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155) at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:979) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/10/08 15:15:36 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 6.0 (TID 6, localhost): java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155) at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:979) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

15/10/08 15:15:36 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 1 times; aborting job 15/10/08 15:15:36 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 15/10/08 15:15:36 INFO scheduler.TaskSchedulerImpl: Cancelling stage 6 15/10/08 15:15:36 INFO scheduler.DAGScheduler: ResultStage 6 (main at NativeMethodAccessorImpl.java:-2) failed in 0.233 s 15/10/08 15:15:36 INFO scheduler.DAGScheduler: Job 6 failed: main at NativeMethodAccessorImpl.java:-2, took 0.279367 s org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 6, localhost): java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155) at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:979) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) astro> exit; 15/10/08 15:39:40 INFO spark.SparkContext: Invoking stop() from shutdown hook 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 15/10/08 15:39:40 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.2.15:4040 15/10/08 15:39:40 INFO scheduler.DAGScheduler: Stopping DAGScheduler 15/10/08 15:39:40 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15/10/08 15:39:40 INFO util.Utils: path = /tmp/spark-5e84c9ec-e1b7-4f12-a466-f035c0ca6e7b/blockmgr-1e69a927-6ecd-473f-8897-b5bfa0f4ffe3, already present as root for deletion. 15/10/08 15:39:40 INFO storage.MemoryStore: MemoryStore cleared 15/10/08 15:39:40 INFO storage.BlockManager: BlockManager stopped 15/10/08 15:39:40 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 15/10/08 15:39:40 INFO spark.SparkContext: Successfully stopped SparkContext 15/10/08 15:39:40 INFO util.Utils: Shutdown hook called 15/10/08 15:39:40 INFO util.Utils: Deleting directory /tmp/spark-5e84c9ec-e1b7-4f12-a466-f035c0ca6e7b 15/10/08 15:39:40 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

xinyunh commented 8 years ago

@rkiyer999,

You might try adding "IN StringFormat" at the last of the "creating table" command.