Huawei-Spark / Spark-SQL-on-HBase

Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
Apache License 2.0
321 stars 164 forks source link

Problem in reading integer value #28

Closed frdo closed 9 years ago

frdo commented 9 years ago

I write an int value (the 21 in this case) into HBase through the HBase's Java API as follow:

Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "table_name");
Put put = new Put(Bytes.toBytes("1"));
put.addImmutable(Bytes.toBytes("columnFamily"), Bytes.toBytes("int_column"), Bytes.toBytes(21));
table.put(put);
table.flushCommits();
table.close();

Then i read it through the connector as follow:

hbaseCtx.read().format("org.apache.spark.sql.hbase.HBaseSource")
                .option("namespace", "")
                .option("tableName", "table_name")
                .option("hbaseTableName", "table_name")
                .option("encodingFormat", "")
                .option("colsSeq", "row_key,int_column")
                .option("keyCols", "row_key,string")
                .option("nonKeyCols", "int_column,int,columnFamily,int_column")
                .load();

DataFrame df_table = hbaseCtx.table(tableName);

But i can't figure out why when i try to print it out with the df_table.show() function i don't get the 21 instead i get the -2147483627 value.

yzhou2001 commented 9 years ago

Astro does not actually use HBase's Bytes utilities which was discovered to be not order-preserving. Please consider using Astro's own utilities in the org.apache.spark.sql.hbase.util package.

bomeng commented 9 years ago

Yes, the integer is stored as binary array in HBase. To keep its ordering, for example, -1 and 1, we need to handle it by ourselves, not via HBase Bytes utility.

frdo commented 9 years ago

Thank you. It works now.

vikatskhay commented 8 years ago

Hi @frdo. How exactly did you make it work? We are facing the same problem. Thank you.

frdo commented 8 years ago

I'm sorry to answer you too late. Hope this is still useful. I solved it by encoding the integer value as follow:

import org.apache.spark.sql.hbase.util.BinaryBytesUtils; import org.apache.spark.sql.types.DataType; byte[] bytes_value = BinaryBytesUtils.create(DataType.fromCaseClassString("IntegerType")).toBytes(21);