Open chenbodeng719 opened 1 year ago
A few hints for troubleshooting:
@LucaCanali
Did I miss something?
On the client side:
On the client side:
- which version of Spark do you use?
- do you run it with --jars $JAR1,$JAR2 --packages org.apache.hbase:hbase-shaded-mapreduce:2.4.9 ?
do you run it with --jars $JAR1,$JAR2 --packages org.apache.hbase:hbase-shaded-mapreduce:2.4.9 ? I use jupyter + Livy + spark. I put jar to spark class path.
def get_data_from_hbase():
data_source_format = 'org.apache.hadoop.hbase.spark'
tname = "candidate"
tmap = "uid STRING :key, oridata STRING f1:data"
df = sqlc.read.format(data_source_format) \
.option('hbase.table',tname) \
.option('hbase.columns.mapping', tmap) \
.option('hbase.spark.use.hbasecontext', False) \
.option("hbase.spark.pushdown.columnfilter", False) \
.load()
# print(df.count())
tlist = ["tiq_ed4f3ab4-d2d4-4108-83c8-408ff3cf5f2b"]
df = df \
.filter(df.uid.isin(tlist)) \
.withColumn("position_title",get_json_object(col("oridata"), "$.basic.current_position.position_title") ) \
.select(
"uid",
"oridata",
)
df.show()
If I set hbase.spark.pushdown.columnfilter false, it's ok. If ture, not ok.
Does it work from the spark-shell?
Does it work from the spark-shell?
Same error
My error is different with error in md "java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/spark/datasources/JavaBytesEncoder". Maybe it's a server config error?
Can you try using HBase 2.3.x ?
Can you try using HBase 2.3.x ?
There is no 2.3.x hbase in aws emr release.
Unfortunately I cannot against HBase (server ) 2.4 yet to test this. I have just compiled the connector jars using Spark 3.3.1 and HBase 2.4.15 and linked the URLs at https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_HBase_Connector.md , not sure if it helps though.
Considering that our query is more complex than range query and the query may not trigger the push down filter, so now we ignore it. Thanks for your patient. We'll test later. @LucaCanali
hi @LucaCanali, have you managed to test against HBase 2.4? I am having this issue pop up:
def catalog = s"""{ |"table":{"namespace":"dev", "name":"amilosevic"}, |"rowkey":"key", |"columns":{ |"col0":{"cf":"rowkey", "col":"key", "type":"binary"}, |"col1":{"cf":"a", "col":"col1", "type":"string"} |} |}""".stripMargin
scala> spark.sqlContext.read.option("catalog",catalog).format("org.apache.hadoop.hbase.spark").load()
java.lang.NullPointerException
at org.apache.hadoop.hbase.spark.HBaseRelation.
on a local spark 3.3.0 + hbase 2.4.14 using your connector jars for spark 3.3.1(placed in hbase/lib) and in spark-submit
after some inspection, this class seems to be failing the predicate pushdown https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Caused by: java.lang.ClassCastException: com.google.protobuf.LiteralByteString cannot be cast to org.apache.hbase.thirdparty.com.google.protobuf.ByteString at org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter.parseFrom(SparkSQLPushDownFilter.java:208) ... 12 more
Add jars to hbase server side according to https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_HBase_Connector.md. But it won't work for me. I get error as below. Please help me.