Open mgyboom opened 1 year ago
Hello @mgyboom, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi (Incubating).
... helps to solve failures access Kyuubi.
The stackstrace indicates that you are accessing HiveServer2 instead of Kyuubi.
BTW, it does not support writing, and this way is not efficient, it is there mostly because of security purposes.
What are your original requirements?
... 有助于解决访问Kyuubi的故障。
stackstrace 表明您正在访问 HiveServer2 而不是 Kyuubi。
顺便说一句,它不支持写入,而且这种方式效率不高,主要是出于安全目的。
您的原始要求是什么?
Yes, I'm making my Spark program access to Hive data sources, through JDBC way. 'kyuubi hive jdbc dialect plugin' seems to make spark read hive tables, but cannot write hive tables. Is it because the plugin is not supported?
also cc @bowenliang123
can you check the HiveServer2 log to find the SQL string
it actually received?
Plus, could you show us the ddl of tables ibond. a1000w_30d
and ibond. mgy_test_1
?
Metadata of column type could help to investigate in this issue.
Btw, using JDBC RDD in Spark to write data to Kyuubi via HiveDriver or KyuubiHiveDriver is not working. It's known that Spark JDBC RDD force to use addBatch
method of the JDBC driver and this method is currently not implemented in both drivers.
Yes, I found that the PreparedStatement.addBatch
method is called in JdbcUtil.ssavePartition,
but HivePreparedStatement
does not implement specific logic, but directly throws SQLFeatureNotSupportedException("Method not supported").
Does Kyuubi-Hive-JDBC plan to support Spark Write DataFrame into Hive Tables in the future?
Plus, could you show us the ddl of tables
ibond. a1000w_30d
andibond. mgy_test_1
? Metadata of column type could help to investigate in this issue.
DDL is as follows:
create table a1000w_30d
(
id1 bigint,
y int,
x0 string
)
row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
with serdeproperties ('field.delim' = ',') stored as
inputformat 'org.apache.hadoop.mapred.TextInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'hdfs://namenode:8020/user/hive/warehouse/ibond.db/a1000w_30d'
tblproperties ('skip.header.line.count' = '1');
Is ibond.mgy_test_1
a non-existed table?
I think the problem is not the type mapping. Since BIGINT is supported by Hive and Spark correctly convert from Spark's LongType
to Hive's BIGINT
in Spark's JDBCUtils.getCommonJDBCType
following the KyuubiHiveDialect.
I guess it is more likely to about auto created column name in Spark, the .
in column name puzzles HiveServer2.
Try to workaround this with explicit schema definition for JDBC writer in Spark. (But and then you will face the unimplemented addBatch
problem. 🤣)
Is
ibond.mgy_test_1
a non-existed table?
Yes, ibond.mgy_test_1 does not exist.
我认为问题不在于类型映射。由于 Hive 和 Spark 支持 BIGINT,因此在 Spark 的KyuubiHiveDialect之后正确地将 Spark 转换
LongType
为 Hive 。 我想这更有可能是关于 Spark 中自动创建的列名,列名中的列名让 HiveServer2 感到困惑。BIGINT``JDBCUtils.getCommonJDBCType
.
尝试使用 Spark 中 JDBC 编写器的显式模式定义来解决此问题。(但是然后你将面临未实现的
addBatch
问题。🤗)
Thank you for your answer.
@mgyboom I made a simple design based on HDFS; , just to set the ball rolling.https://blog.csdn.net/m0_58032574/article/details/128846418?spm=1001.2014.3001.5501
Code of Conduct
Search before asking
Describe the bug
I use the hive jdbc dialect plugin like this:https://kyuubi.readthedocs.io/en/latest/extensions/engines/spark/jdbc-dialect.html But I'm running into some issues in my spark program.
I build kyuubi-extension-spark-jdbc-dialect_2.12-1.7.0-SNAPSHOT.jar from branch 'master', put it into $SPARK_HOEM/jars.
show my code:
console output:
Affects Version(s)
master
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
No response
Kyuubi Server Configurations
No response
Kyuubi Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?