[HUDI-8394] Restrict multiple bulk inserts into COW with simple bucket and disabled Spark native Row

geserdugarov commented 1 week ago

Change Logs

In the case of:

bulk insert operation,
hoodie.datasource.write.row.writer.enable = false,
simple bucket index,

we could do bulk insert into COW table multiple times. And only the first one will produce parquet files, the next one will produce log files, despite the fact that table type is COW. To prevent it, restrict of AppendHandleFactory calling for COW table is added.

Full discussion is available in https://github.com/apache/hudi/issues/12133.

Impact

No

Risk level (write none, low medium or high below)

Low

Documentation Update

No need

Contributor's checklist

[x] Read through contributor's guide
[x] Change Logs and Impact were stated clearly
[x] Adequate tests were added if applicable
[x] CI passed

geserdugarov commented 6 days ago

CI is broken on current master. Some test cases are flaky, but the problem with testSecondaryIndexWithClusteringAndCleaning looks like reproducible. Checked it here: https://github.com/apache/hudi/pull/12264

geserdugarov commented 2 hours ago

Got CI fail on not affected tests:

[ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 71.934 s <<< FAILURE! - in org.apache.hudi.functional.TestStructuredStreaming
[ERROR] testStructuredStreamingWithClustering{boolean}[1]  Time elapsed: 11.999 s  <<< ERROR!
java.util.NoSuchElementException: No value present in Option
    at org.apache.hudi.common.util.Option.get(Option.java:93)
    at org.apache.hudi.common.table.HoodieTableMetaClient.lambda$new$0(HoodieTableMetaClient.java:180)
    at org.apache.hudi.common.util.Option.orElseGet(Option.java:153)
    at org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:180)
    at org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:791)
    at org.apache.hudi.common.table.HoodieTableMetaClient.access$100(HoodieTableMetaClient.java:106)
    at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:899)
    at org.apache.hudi.HoodieDataSourceHelpers.allCompletedCommitsCompactions(HoodieDataSourceHelpers.java:126)
    at org.apache.hudi.functional.TestStructuredStreaming.waitTillAtleastNCommits(TestStructuredStreaming.scala:225)
    at org.apache.hudi.functional.TestStructuredStreaming.$anonfun$structuredStreamingForTestClusteringRunner$1(TestStructuredStreaming.scala:409)

- Test Secondary Index With Updates Compaction Clustering Deletes *** FAILED ***
  org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
  ...
  at org.apache.spark.sql.hudi.command.index.TestSecondaryIndex.validateSecondaryIndex(TestSecondaryIndex.scala:370)

Will try to rebase and restart.

hudi-bot commented 1 hour ago

CI report:

1300ba8dc1c26b6c277e2ebe8fa7eb449c3304ed Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build

geserdugarov commented 12 minutes ago

After second CI run still got not related to this MR:

- Test Secondary Index With Updates Compaction Clustering Deletes *** FAILED ***
  org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
 ...
  at org.apache.spark.sql.hudi.command.index.TestSecondaryIndex.validateSecondaryIndex(TestSecondaryIndex.scala:373)

Couldn't reproduce this issue locally.

apache / hudi