apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.39k stars 2.42k forks source link

Error when upgrading to hudi 0.12.0 from 0.9.0 #7060

Closed navbalaraman closed 1 year ago

navbalaraman commented 2 years ago

We are using spark 3.1.2 with hudi 0.9.0 in our application and AWS S3 and AWS Glue Catalog to store and expose the data ingested. As part of a source data change where some of the new records are now coming in as null but this column exists in the table schema as it was built based on earlier records which had values against these columns. Based on the some of the issues reported (eg: HUDI-4276], we identified that this issue could be resolved with upgrading to hudi 0.12.0. When upgrading hudi we are facing below error. Can you please provide info on what is causing this issue? (Only pom version changes have been done, no code changes)

Error:

org.apache.spark.sql.adapter.Spark3_1Adapter
java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark3_1Adapter
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:39)
    at org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
    at org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:65)
    at org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:65)
    at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:150)
    at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:540)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:178)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:183)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
    at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)

pom.xml

<dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.12.12</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.1.2</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.1.2</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.12</artifactId>
      <version>3.1.2</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.hudi</groupId>
      <artifactId>hudi-spark3-bundle_2.12</artifactId>
      <version>0.12.0</version>
    </dependency>
    <dependency>
      <groupId>org.mongodb.spark</groupId>
      <artifactId>mongo-spark-connector_2.12</artifactId>
      <version>3.0.1</version>
    </dependency>
namuny commented 2 years ago

Naming scheme of the hudi-spark-bundle jar has changed since 0.9.0. hudi-spark3-bundle now points to Spark 3.2. For Spark 3.1, you'll now need to pull hudi-spark3.1-bundle. Release notes for reference

navbalaraman commented 2 years ago

Thanks @namuny I updated the pom reference as below but still getting the same error.

<!-- https://mvnrepository.com/artifact/org.apache.hudi/hudi-spark3.1-bundle -->
<dependency>
  <groupId>org.apache.hudi</groupId>
  <artifactId>hudi-spark3.1-bundle_2.12</artifactId>
  <version>0.12.0</version>
</dependency>

org.apache.spark.sql.adapter.Spark3_1Adapter java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark3_1Adapter

also, below are some other references in my pom.xml

<dependency>
  <groupId>org.scala-lang</groupId>
  <artifactId>scala-library</artifactId>
  <version>2.12.12</version>
</dependency>

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.12</artifactId>
  <version>3.1.2</version>
  <scope>provided</scope>
</dependency>

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql_2.12</artifactId>
  <version>3.1.2</version>
  <scope>provided</scope>
</dependency>

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-hive_2.12</artifactId>
  <version>3.1.2</version>
  <scope>provided</scope>
</dependency>
nsivabalan commented 2 years ago

is this EMR version of spark or OSS spark ?

nsivabalan commented 1 year ago

@navbalaraman : gentle ping.

navbalaraman commented 1 year ago

@nsivabalan EMR 6.8 supports only 0.11.1 but we needed to get on 0.12.0 to have the null value issue resolved. so I upgraded spark to 3.3.0 and used the below hudi spark bundle to get it working now. Thanks for your inputs.

<dependency>
  <groupId>org.apache.hudi</groupId>
  <artifactId>hudi-spark3.3-bundle_2.12</artifactId>
  <version>0.12.0</version>
</dependency>
navbalaraman commented 1 year ago

@nsivabalan Now i am seeing an issue with the upgrade code running on EMR 6.8.0. The initial pull of the data happnd with hudi 0.9.0 and spark 3.1.2 on EMR 6.5. Now the incremental pull with hudi 0.12.0 and spark 3.3.0 on EMR 6.8.0 is failing with the below error. Any thoughts on what is needed to make this incremental pull to work without having to wipe out existing data and do initial pull again with the new version?

It throws an error saying "_col_partition" which is the partition field (set as below) is missing. I verified in the Glue catalog and it does exist. In the existing parquet the col name is like "_hoodie_partition_path":"_col_partition=2022-10-12" DataSourceWriteOptions.PARTITIONPATH_FIELD.key() -> "_col_partition",

Also, on the initial and incremental pull we have the below config to avoid duplicate partition column showing up in glue catalog. option(DataSourceWriteOptions.DROP_PARTITION_COLUMNS.key(), true) other configs:

HoodieWriteConfig.TBL_NAME.key() -> tableName,
      DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE",
      DataSourceWriteOptions.RECORDKEY_FIELD.key() -> "_id.oid",
      DataSourceWriteOptions.PARTITIONPATH_FIELD.key() -> "_col_partition",
      DataSourceWriteOptions.PRECOMBINE_FIELD.key() -> "_update_ts",
      DataSourceWriteOptions.HIVE_SYNC_ENABLED.key() -> "true",
      DataSourceWriteOptions.HIVE_URL.key() -> s"jdbc:hive2://${masterDns}:10000",
      DataSourceWriteOptions.HIVE_DATABASE.key() -> dbName,
      DataSourceWriteOptions.HIVE_TABLE.key() -> tableName,
      DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS.key() -> classOf[MultiPartKeysValueExtractor].getName,
      DataSourceWriteOptions.HIVE_PARTITION_FIELDS.key() -> "_col_partition",
      HoodieCleanConfig.CLEANER_FILE_VERSIONS_RETAINED.key() -> "1",
      HoodieCleanConfig.CLEANER_POLICY.key() -> HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name(),
      HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE.key() -> "true",
      HoodieSyncConfig.META_SYNC_DATABASE_NAME.key() ->dbname,
      HoodieSyncConfig.META_SYNC_TABLE_NAME.key() -> tableName,
.option(DataSourceWriteOptions.OPERATION.key(), DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
     .option(DataSourceWriteOptions.DROP_PARTITION_COLUMNS.key(), true)
      .option(DataSourceWriteOptions.RECONCILE_SCHEMA.key(), true)

ERROR STACK TRACE:

diagnostics: User class threw exception: org.sparkproject.guava.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: Cannot find columns: '_col_partition' in the schema[StructField(_hoodie_commit_time,StringType,true),StructField(_hoodie_commit_seqno,StringType,true),StructField(_hoodie_record_key,StringType,true),StructField(_hoodie_partition_path,StringType,true),StructField(_hoodie_file_name,StringType,true),StructField(_col_origin_ts,LongType,true),StructField(_col_update_ts,LongType,true),StructField(devicetype,StringType,true),StructField(deviceid,StringType,true),StructField(createdOn,TimestampType,true),StructField(channel,StringType,true),StructField(source,StringType,true),StructField(_class,StringType,true),StructField(model,StringType,true),StructField(commandoperation,StringType,true),StructField(phases,ArrayType(StructType(StructField(dateTime,TimestampType,true),StructField(dcmErrorCode,StringType,true),StructField(message,StringType,true),StructField(payload,StringType,true),StructField(phaseDescription,StringType,true),StructField(phaseId,IntegerType,true),StructField(statusCode,IntegerType,true),StructField(statusDescription,StringType,true)),true),true),StructField(guid,StringType,true),StructField(commanddatetime,TimestampType,true),StructField(_id,StructType(StructField(oid,StringType,true)),true),StructField(isacsettingson,BooleanType,true),StructField(command,StringType,true),StructField(returncode,StringType,true),StructField(osversion,StringType,true),StructField(totaltimetaken,LongType,true),StructField(dcmtype,StringType,true),StructField(status,StringType,true),StructField(vin,StringType,true),StructField(correlationid,StringType,true),StructField(apprequestnumber,StringType,true),StructField(profilename,StringType,true),StructField(make,StringType,true),StructField(commandinitiatedby,StringType,true),StructField(xlocale,StringType,true),StructField(commandrequest,StringType,true),StructField(modifiedOn,TimestampType,true),StructField(appbrand,StringType,true)] at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2263) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:177) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:272) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:320) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:306) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:175) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:226) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:175) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:76) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:75) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:306) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:246) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:215) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:212) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:284) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:327) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:284) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:274) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:274) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:188) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:227) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:223) at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:172) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:223) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:187) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:208) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:207) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:78) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:213) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:552) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:213) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:212) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:78) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:76) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:68) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:93) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:91) at org.apache.spark.sql.SparkSession.table(SparkSession.scala:604) at org.apache.spark.sql.internal.CatalogImpl.refreshTable(CatalogImpl.scala:540) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$3(HoodieSparkSqlWriter.scala:658) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$3$adapted(HoodieSparkSqlWriter.scala:655) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:655) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:734) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:338) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:183) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:124) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)

navbalaraman commented 1 year ago

@nsivabalan gentle reminder. any insights is appreciated.

codope commented 1 year ago

@navbalaraman The issue due to partition field was fixed recently by https://github.com/apache/hudi/pull/7132 Please reopen another issue if it still persists.