Huawei-Spark / Spark-SQL-on-HBase

Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
Apache License 2.0
321 stars 164 forks source link

BytesUtils 中 toDouble 和 toLong 错误 #3

Closed secfree closed 9 years ago

secfree commented 9 years ago

当列中有数据类型定义为 double 或 long 时, 使用:

select * from t limit 3;

报错:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8.0 (TID 7, localhost): java.lang.IllegalArgumentException: offset (70) + length (8) exceed the capacity of the array: 71
        at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600)
        at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:578)
        at org.apache.spark.sql.hbase.util.BytesUtils$.toDouble(BytesUtils.scala:52)
        at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:92)
        at org.apache.spark.sql.hbase.HBaseRelation.org$apache$spark$sql$hbase$HBaseRelation$$setColumn(HBaseRelation.scala:885)
        at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:969)

或者

15/07/30 17:35:34 ERROR Executor: Exception in task 0.0 in stage 69.0 (TID 79)
java.lang.ArrayIndexOutOfBoundsException: 71
        at org.apache.spark.sql.hbase.util.BytesUtils$$anonfun$toLong$1.apply$mcVI$sp(BytesUtils.scala:85)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at org.apache.spark.sql.hbase.util.BytesUtils$.toLong(BytesUtils.scala:84)
        at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:95)
        at org.apache.spark.sql.hbase.HBaseRelation.org$apache$spark$sql$hbase$HBaseRelation$$setColumn(HBaseRelation.scala:885)
        at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:969)
        at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:965)

这是一个 bug ?

xinyunh commented 9 years ago

I will take a look at it.

xinyunh commented 9 years ago

Hi @secfree,

I tried the following case:

sql(
  """CREATE TABLE test (shop_id Long, vender_id Double, sku_id Long,
    |PRIMARY KEY (sku_id))
    |MAPPED BY (test, COLS=[shop_id=d.shop_id,vender_id=d.vender_id])""".stripMargin
)
sql("INSERT INTO table test  VALUES(0, 0.1, 0)")
sql("INSERT INTO table test  VALUES(32, 1.2, 2)")
sql("INSERT INTO table test  VALUES(12312, 2.1, 4)")
sql("INSERT INTO table test  VALUES(11, 3.2, 3)")
sql("select * from test limit 3").show

And it give me the correct result. Thus, may I have your test cases and the data in the table?

secfree commented 9 years ago

Here is a row of my table which i mapped from a exist table in HBase.

| 0|   1|   0|  0|  0|  6183|  0|  0.147|  0|  42061.206177 |

When mapping, if I set all column's type to long or double, it cause the error I reported.

If I just set the type to string, it's ok.

xinyunh commented 9 years ago

Hi @secfree,

I am sorry. For now, we don't support the existed table that created by hbase directly.

secfree commented 9 years ago

Ok, maybe I got a misunderstanding about the title: Example 1: Create and query SparkSQL table map to existing Hbase table

xinyunh commented 9 years ago

Hi @secfree,

It isn't your fault. It's our mistake. I will tell our leader to delete this mis-guiding info. Sorry for the confusion.

xinyunh commented 9 years ago

Hi @secfree,

We just checkin the new feature to support pre-exist table in HBase. You are welcome to checkout and have a try. :) The syntax to create a mapping to the table created by hbase is similar to :

"CREATE TABLE tb0 (column2 INTEGER, column1 INTEGER, column4 FLOAT, column3 SHORT, PRIMARY KEY(column1)) MAPPED BY (testNamespace.ht0, COLS=[column2=family0.qualifier0, column3=family1.qualifier1, column4=family2.qualifier2]) IN StringFormat"