Huawei-Spark / Spark-SQL-on-HBase

Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
Apache License 2.0
321 stars 164 forks source link

toInt Error #18

Closed kingsaction closed 9 years ago

kingsaction commented 9 years ago

when I create table and assign one column is Int. But when I execute sql , it will faild. The error information: astro> select * from test3; 15/09/29 01:09:50 INFO HBaseSQLCliDriver: Processing select * from test3 15/09/29 01:09:50 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 15/09/29 01:09:50 INFO ZooKeeper: Client environment:host.name=localhost 15/09/29 01:09:50 INFO ZooKeeper: Client environment:java.version=1.7.0_25 15/09/29 01:09:50 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation 15/09/29 01:09:50 INFO ZooKeeper: Client environment:java.home=/cloudera/jdk1.7/jre 15/09/29 01:09:50 INFO ZooKeeper: Client environment:java.class.path=/cloudera/spark/lib/datanucleus-api-jdo-3.2.6.jar:/cloudera/spark/lib/hbase-it-0.98.6-cdh5.3.1-tests.jar:/cloudera/spark/lib/datanucleus-core-3.2.10.jar:/cloudera/spark/lib/spark-sql-on-hbase-1.0.0.jar:/cloudera/spark/lib/original-spark-sql-on-hbase-1.0.0.jar:/cloudera/spark/lib/spark-examples-1.4.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/cloudera/spark/lib/hbase-prefix-tree-0.98.6-cdh5.3.1.jar:/cloudera/spark/lib/hbase-server-0.98.6-cdh5.3.1.jar:/cloudera/spark/lib/hbase-testing-util-0.98.6-cdh5.3.1.jar:/cloudera/spark/lib/hbase-shell-0.98.6-cdh5.3.1.jar:/cloudera/spark/lib/hbase-thrift-0.98.6-cdh5.3.1.jar:/cloudera/spark/lib/hbase-server-0.98.6-cdh5.3.1-tests.jar:/cloudera/spark/lib/datanucleus-rdbms-3.2.9.jar:/cloudera/spark/lib/hbase-protocol-0.98.6-cdh5.3.1.jar:/cloudera/spark/lib/protobuf-java-2.5.0.jar:/cloudera/spark/lib/spark-assembly-1.4.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/cloudera/spark/conf/:/cloudera/spark/lib/spark-assembly-1.4.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/cloudera/spark/lib/datanucleus-api-jdo-3.2.6.jar:/cloudera/spark/lib/datanucleus-core-3.2.10.jar:/cloudera/spark/lib/datanucleus-rdbms-3.2.9.jar 15/09/29 01:09:50 INFO ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 15/09/29 01:09:50 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp 15/09/29 01:09:50 INFO ZooKeeper: Client environment:java.compiler= 15/09/29 01:09:50 INFO ZooKeeper: Client environment:os.name=Linux 15/09/29 01:09:50 INFO ZooKeeper: Client environment:os.arch=amd64 15/09/29 01:09:50 INFO ZooKeeper: Client environment:os.version=2.6.32-504.el6.x86_64 15/09/29 01:09:50 INFO ZooKeeper: Client environment:user.name=master 15/09/29 01:09:50 INFO ZooKeeper: Client environment:user.home=/home/master 15/09/29 01:09:50 INFO ZooKeeper: Client environment:user.dir=/cloudera/spark-hbase/bin 15/09/29 01:09:50 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x526ebdba, quorum=localhost:2181, baseZNode=/hbase 15/09/29 01:09:50 INFO RecoverableZooKeeper: Process identifier=hconnection-0x526ebdba connecting to ZooKeeper ensemble=localhost:2181 15/09/29 01:09:50 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 15/09/29 01:09:50 INFO ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session 15/09/29 01:09:50 INFO ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15014b5f5040037, negotiated timeout = 90000 15/09/29 01:09:51 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x526ebdba, quorum=localhost:2181, baseZNode=/hbase 15/09/29 01:09:51 INFO RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x526ebdba connecting to ZooKeeper ensemble=localhost:2181 15/09/29 01:09:51 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 15/09/29 01:09:51 INFO ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session 15/09/29 01:09:51 INFO ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15014b5f5040038, negotiated timeout = 90000 15/09/29 01:09:51 INFO deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 15/09/29 01:09:51 INFO ZooKeeper: Session: 0x15014b5f5040038 closed 15/09/29 01:09:51 INFO ClientCnxn: EventThread shut down 15/09/29 01:09:52 INFO HBaseRelation: Number of HBase regions for table test: 1 15/09/29 01:09:52 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x526ebdba, quorum=localhost:2181, baseZNode=/hbase 15/09/29 01:09:52 INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 15/09/29 01:09:52 INFO ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 15/09/29 01:09:52 INFO RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x526ebdba connecting to ZooKeeper ensemble=localhost:2181 15/09/29 01:09:52 INFO ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15014b5f5040039, negotiated timeout = 90000 15/09/29 01:09:52 INFO ZooKeeper: Session: 0x15014b5f5040039 closed 15/09/29 01:09:52 INFO ClientCnxn: EventThread shut down 15/09/29 01:09:52 INFO SparkContext: Starting job: main at NativeMethodAccessorImpl.java:-2 15/09/29 01:09:52 INFO DAGScheduler: Got job 0 (main at NativeMethodAccessorImpl.java:-2) with 1 output partitions (allowLocal=false) 15/09/29 01:09:52 INFO DAGScheduler: Final stage: ResultStage 0(main at NativeMethodAccessorImpl.java:-2) 15/09/29 01:09:52 INFO DAGScheduler: Parents of final stage: List() 15/09/29 01:09:52 INFO DAGScheduler: Missing parents: List() 15/09/29 01:09:52 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at main at NativeMethodAccessorImpl.java:-2), which has no missing parents 15/09/29 01:09:52 INFO MemoryStore: ensureFreeSpace(14784) called with curMem=0, maxMem=277842493 15/09/29 01:09:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 14.4 KB, free 265.0 MB) 15/09/29 01:09:52 INFO MemoryStore: ensureFreeSpace(13323) called with curMem=14784, maxMem=277842493 15/09/29 01:09:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.0 KB, free 264.9 MB) 15/09/29 01:09:52 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:44603 (size: 13.0 KB, free: 265.0 MB) 15/09/29 01:09:52 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874 15/09/29 01:09:52 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at main at NativeMethodAccessorImpl.java:-2) 15/09/29 01:09:52 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 15/09/29 01:09:52 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, NODE_LOCAL, 1688 bytes) 15/09/29 01:09:52 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/09/29 01:09:52 INFO Executor: Fetching http://127.0.0.1:41302/jars/spark-sql-on-hbase-1.0.0.jar with timestamp 1443460180930 15/09/29 01:09:52 INFO Utils: Fetching http://127.0.0.1:41302/jars/spark-sql-on-hbase-1.0.0.jar to /tmp/spark-7ce18148-9a8f-4dda-9910-4de00e33c41f/userFiles-adc950d7-749a-4fd7-b542-09871a7e0686/fetchFileTemp1293095241738742836.tmp 15/09/29 01:09:53 INFO Executor: Adding file:/tmp/spark-7ce18148-9a8f-4dda-9910-4de00e33c41f/userFiles-adc950d7-749a-4fd7-b542-09871a7e0686/spark-sql-on-hbase-1.0.0.jar to class loader 15/09/29 01:09:53 INFO HBasePartition: None 15/09/29 01:09:53 INFO deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 15/09/29 01:09:53 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.ArrayIndexOutOfBoundsException: 26 at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155) at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97) at org.apache.spark.sql.hbase.HBaseRelation.org$apache$spark$sql$hbase$HBaseRelation$$setColumn(HBaseRelation.scala:892) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:976) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 15/09/29 01:09:53 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ArrayIndexOutOfBoundsException: 26 at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155) at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97) at org.apache.spark.sql.hbase.HBaseRelation.org$apache$spark$sql$hbase$HBaseRelation$$setColumn(HBaseRelation.scala:892) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:976) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724)

15/09/29 01:09:53 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job 15/09/29 01:09:53 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/09/29 01:09:53 INFO TaskSchedulerImpl: Cancelling stage 0 15/09/29 01:09:53 INFO DAGScheduler: ResultStage 0 (main at NativeMethodAccessorImpl.java:-2) failed in 0.434 s 15/09/29 01:09:53 INFO DAGScheduler: Job 0 failed: main at NativeMethodAccessorImpl.java:-2, took 0.603352 s org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ArrayIndexOutOfBoundsException: 26 at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155) at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97) at org.apache.spark.sql.hbase.HBaseRelation.org$apache$spark$sql$hbase$HBaseRelation$$setColumn(HBaseRelation.scala:892) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:976) at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188) at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

ljp215 commented 8 years ago

you need set spark.hbase.host to sparkConf example: sparkConf.set("spark.hbase.host", "10.2.2.2")