Open yihao-tcf opened 1 year ago
What's the result of spark-shell
?
Comparing with Kyuubi, that spark-sql
has some hacks on the Hive IsolateClassloader, not sure if they are related
I can't reproduce.
Kyuubi 1.7.0 Spark 3.2.3 Hudi 0.13.0
I can't reproduce.
Kyuubi 1.7.0 Spark 3.2.3 Hudi 0.13.0
Hello, can you provide your Kyuubi Server Configuration and Kyuubi Server Configuration? thanks
What's the result of
spark-shell
?Comparing with Kyuubi, that
spark-sql
has some hacks on the Hive IsolateClassloader, not sure if they are related
Success through spark-shell
Hello, can you provide your Kyuubi Server Configuration and Kyuubi Server Configuration? thanks
Kyuubi server only configures Spark home in kyuubi-env.sh
The configuration of spark is as follows
SPARK_HOME/conf/spark-defaults.conf
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
SPARK_HOME/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
</configuration>
Hello, can you provide your Kyuubi Server Configuration and Kyuubi Server Configuration? thanks
Kyuubi server only configures Spark home in
kyuubi-env.sh
The configuration of spark is as follows
SPARK_HOME/conf/spark-defaults.conf
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension spark.serializer=org.apache.spark.serializer.KryoSerializer spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
SPARK_HOME/conf/hive-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.uris</name> <value>thrift://localhost:9083</value> </property> </configuration>
Hello buddy, could you provide a screenshot of the corresponding SQL Details and SQL/DataFrame Properties information found in the SQL/DataFrame menu bar in the Spark UI after executing the dml statement through kyuubi? Let me compare the differences between us. Thank you.
As shown in the following figure:
Connect to the Spark SQL engine through kyuubi and set parameters:
hoodie.schema.on.read.enable=true
Debug hudi-spark3-datasource found that the configuration settings for hoodie.schema.on.read.enable
were not successful and converted my table to DS v1.
Check out HUDI-4178 for more details.
This is already my second time switching versions Hudi 0.13.0 Spark. 3.3.2 Kyuubi 1.7.1
Would you mind trying kyuubi.engine.single.spark.session=true
(add in kyuubi-defaults.conf
)?
One difference between Kyuubi and spark-sql
spark-shell
is that Kyuubi use difference SparkSession
for difference session(JDBC connection or Beeline session), but the latter only use one global SparkSession
.
Hudi 0.13.0
Spark. 3.3.2
Kyuubi 1.7.1
create table hudi_mor_tbl (
id int,
name string,
price double,
ts bigint
) using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts'
);
set hoodie.schema.on.read.enable=true;
set hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true;
alter table hudi_mor_tbl drop column price;
BTW You can find me in the Kyuubi WeChat user group or Slack, and we can communicate offline.
Would you mind trying
kyuubi.engine.single.spark.session=true
(add inkyuubi-defaults.conf
)?One difference between Kyuubi and
spark-sql
spark-shell
is that Kyuubi use differenceSparkSession
for difference session(JDBC connection or Beeline session), but the latter only use one globalSparkSession
.
Thank you very much. After setting kyuubi.engine.single.park.session=true
configuration, my problem has been resolved. But I still have doubts. I checked the previous errors through the SPARK UI, and all my SQL was executed in one session, but why did I report an error.
Hudi 0.13.0 Spark. 3.3.2 Kyuubi 1.7.1
create table hudi_mor_tbl ( id int, name string, price double, ts bigint ) using hudi tblproperties ( type = 'mor', primaryKey = 'id', preCombineField = 'ts' ); set hoodie.schema.on.read.enable=true; set hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true; alter table hudi_mor_tbl drop column price;
BTW You can find me in the Kyuubi WeChat user group or Slack, and we can communicate offline.
Thank you very much for handling this issue for me. May I know how to join the kyuubi WeChat user group.
May I know how to join the kyuubi WeChat user group.
Please check FAQ https://github.com/apache/kyuubi/discussions/2481
I checked the previous errors through the SPARK UI, and all my SQL was executed in one session, but why did I report an error.
This is really a little strange. The parameters of the same session should take effect.
After setting
kyuubi.engine.single.park.session=true
configuration, my problem has been resolved.
One possibility is that Hudi holds the wrong SparkSession
instance.
Please note that kyuubi.engine.single.park.session=true
is not suggested in common cases, when enabled, SET x=y
in one JDBC connection also affects others (because all connections in a Spark Application share one SparkSession
).
One possibility is that Hudi holds the wrong
SparkSession
instance.
This is indeed possible, I have now noticed that Hudi initialization to obtain SparkSession is currently active, and then it may be wrong.
org.apache.spark.sql.hudi.catalog.HoodieCatalog
val spark: SparkSession = SparkSession.active
org.apache.spark.sql.hudi.catalog.HoodieCatalog#loadTable
val schemaEvolutionEnabled: Boolean = spark.sessionState.conf.getConfString(DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.key,
DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.defaultValue.toString).toBoolean
一种可能性是胡迪持有错误的例子。
SparkSession
这确实是可能的,我现在注意到 Hudi 初始化以获取 SparkSession 当前处于活动状态,然后可能是错误的。
org.apache.spark.sql.hudi.catalog.HoodieCatalog
val spark: SparkSession = SparkSession.active
org.apache.spark.sql.hudi.catalog.HoodieCatalog#loadTable
val schemaEvolutionEnabled: Boolean = spark.sessionState.conf.getConfString(DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.key, DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.defaultValue.toString).toBoolean
I can understand it as Hudi obtaining an arbitrary SparkSession, which leads to the inability to access the corresponding configurations. In this light, it seems that neither party has a bug, but rather an issue arising from their interaction. It appears that resolving this problem could be quite challenging.
@pan3793 When I set kyuubi.engine.single.spark.session=true I still face the below error message, these tables are keep update by spark streaming. Caused by: java.io.FileNotFoundException: No such file or directory: Did you ever face the same issues?
@pan3793 When I set kyuubi.engine.single.spark.session=true I still face the below error message, these tables are keep update by spark streaming. Caused by: java.io.FileNotFoundException: No such file or directory: Did you ever face the same issues?
Yes, if the Single pattern Kyuubi Server is enabled, new Spark Sessions will not be generated. The processing method can refer to Hudi issues: https://github.com/apache/hudi/issues/7452
is it the same thing? My error message is due to can't find file for that commit. I use the default setting for hoodie.keep.max/min.commits and should be keep at least 20 commits. After refresh table it is working fine by spark-sql. Why here kyuubi used to get this error message
is it the same thing?
@njalan it should not be the same issue. pointing to this issue/disscusion because there are possibilities that Kyuubi mutli sessions vs. spark-sql single session may cause some differences, especially if someone claims everything works well in spark-sql but not Kyuubi.
Code of Conduct
Search before asking
Describe the bug
Spark version:3.2.3 Hudi version:0.13.0
desc:Connect to the SparkSQL query engine through Kyuubi, expose the service using Hive JDBC Driver, and delete Hudi table fields using Hudi Schema Evolution. The error message is: DROP COLUMN is only supported with v2 tables, But I have no problem using the Hudi Schema Evolution feature through SparkSQL to delete fields from the Hudi table. If using the Hudi Schema Evolution feature, two configurations need to be set:
set hoodie.schema.on.read.enable=true;
set hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true;
If the configuration is not set in SparkSQL, deleting the Hudi table field will also prompt the same error message. So it seems that the configuration I set when using Kyuubi on Spark JdbcDriver does not take effect
Using SparkSQL operations:
Using Kyuubi operations:
Affects Version(s)
1.6.0
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
No response
Kyuubi Server Configurations
Kyuubi Engine Configurations
Additional context
No response
Are you willing to submit PR?