[SUPPORT] "Failed to read schema/check compatibility" on Hudi upgrade from 0.12.2 to Hudi 0.14.1

mzheng-plaid commented 2 months ago

Describe the problem you faced

We're upgrading from Hudi 0.12.2 to Hudi 0.14.1 and are running into failures on all of our log ingestion jobs on:

Caused by: org.apache.hudi.exception.HoodieException: Failed to read schema/check compatibility for base path s3://...
    at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:844)
    at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:854)
writerSchema: ...
tableSchema: ...

The only diff in the schema seems to be the snippet below in all of our jobs where meta is a record type (table_name is a placeholder for every affected table):

➜  ✗ diff writer_schema_table_name.json table_schema_table_name.json
65,66c65
<           "name": "meta",
<           "namespace": "hoodie.table_name.table_name_record",
---
>           "name": "table_name_meta",

Oddly, the table version was bumped even though the commit failed, so we ended up having to run a tedious bulk downgrade command. That seems super surprising.

To Reproduce

Unclear, it seems like some one-time upgrade step of the table version did not run for whatever reason?

Expected behavior

Is this expected behavior? How do we manually upgrade our tables if not?
The table version should not be bumped if the commit fails

Environment Description We are running on EMR 7.2

Hudi version : 0.14.1
Spark version : 3.5.1
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) :

mzheng-plaid commented 2 months ago

Ok I think the root cause is because the upgrade silently turned on schema validation with https://github.com/apache/hudi/blob/release-0.14.1/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java#L822

    boolean shouldValidate = config.shouldValidateAvroSchema();
    boolean allowProjection = config.shouldAllowAutoEvolutionColumnDrop();
    if ((!shouldValidate && allowProjection)
        || getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()
        || StringUtils.isNullOrEmpty(config.getSchema())
    ) {
      // Check not required
      return;
    }

Previously in 0.12.2 https://github.com/apache/hudi/blob/release-0.12.2/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java#L749C1-L754C6

    if (!config.getAvroSchemaValidate() || getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
      // Check not required
      return;
    }

Questions:

Whys there a coupling with hoodie.datasource.write.schema.allow.auto.evolution.column.drop and disabling schema validation? Why is schema validation silently turned on by default now?
Did some table upgrade silently not run successfully causing the schemas to be broken? The namespace/name change seems like an internal detail.
It seems like we can just turn off schema validation but is there a way to fix the schemas of our tables without a re-write?

ad1happy2go commented 2 months ago

Referencing slack thread for this discussion - https://apache-hudi.slack.com/archives/C4D716NPQ/p1723826030979209

mzheng-plaid commented 1 month ago

@ad1happy2go hmm setting hoodie.datasource.write.schema.allow.auto.evolution.column.drop to true still doesn't skip the schema validation check, any idea why?

We're hard blocked by this issue on upgrading and its quite painful, let me know if you have any ideas on how to work around this

mzheng-plaid commented 1 month ago

It seems like there is a second issue, if you use the default configs for Hudi this line (https://github.com/apache/hudi/blob/release-0.14.1/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L751 ):

      parameters.getOrDefault(HoodieWriteConfig.AVRO_SCHEMA_VALIDATE_ENABLE.key(), "true")

Will actually just ignore the default for hoodie.avro.schema.validate and enable schema validation silently... you actually need to explicitly set hoodie.avro.schema.validate to false now

Seems like this regression was introduced in https://github.com/apache/hudi/commit/06c8fa5a62fab607d3be6e321a580d9cf13b572a#diff-8bda4b2174721fd642a543528283[…]a320c1d9e1366b27be86bd548d48aR527

ad1happy2go commented 1 month ago

Thanks a lot @mzheng-plaid for detailed explanation and triaging the issue. This sounds reasonable and we should highlight in our release docs. The default config is confusing in this case.

Created JIRA for tracking the fix - https://issues.apache.org/jira/browse/HUDI-8173

apache / hudi

[SUPPORT] "Failed to read schema/check compatibility" on Hudi upgrade from 0.12.2 to Hudi 0.14.1 #11865