[SUPPORT] Hudi Metadata Compaction is not happening

Jason-liujc commented 1 week ago

Describe the problem you faced

This is the follow up GH issue from a slack channel conversation.

Basically we are seeing some of our table's metadata table's compaction is not happening correctly.

This leads to errors like

Caused by: org.apache.hudi.exception.HoodieMetadataException: Metadata table's deltacommits exceeded 1000: this is likely caused by a pending instant in the data table. Resolve the pending instant or adjust `hoodie.metadata.max.deltacommits.when_pending`, then restart the pipeline.

To Reproduce

Steps to reproduce the behavior:

Hudi table's .hoodie folder the commits list the attachment

Expected behavior

Compaction happens every once a while that remove delta commits and compact them. This should reduce the metadata sizes. This should happen without increasing the hoodie.metadata.max.deltacommits.when_pending parameter.

Environment Description

Hudi version : 0.14.0
Spark version : 3.4.1
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No

Running on AWS EMR 6.15

Additional context

Had offline conversation with Aditya and Shiyan.

This is also a multiwriter usecase

Stacktrace

Add the stacktrace of the error.

Immediate error stack trace:

Caused by: org.apache.hudi.exception.HoodieMetadataException: Metadata table's deltacommits exceeded 1000: this is likely caused by a pending instant in the data table. Resolve the pending instant or adjust `hoodie.metadata.max.deltacommits.when_pending`, then restart the pipeline.
    at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.checkNumDeltaCommits(HoodieBackedTableMetadataWriter.java:815)
    at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateTimelineBeforeSchedulingCompaction(HoodieBackedTableMetadataWriter.java:1337)
    at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.performTableServices(HoodieBackedTableMetadataWriter.java:1236)
    at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:290)

For certain table, when I tried to remove the metadata table and initialize again, I see this error:

24/06/15 01:06:20 ip-10-0-157-87 WARN HoodieBackedTableMetadataWriter: Cannot initialize metadata table as operation(s) are in progress on the dataset: [[==>20240523221631416__commit__INFLIGHT__20240523224939000], [==>20240523225648799__commit__INFLIGHT__20240523232254000], [==>20240524111304660__commit__INFLIGHT__20240524142426000], [==>20240524235127638__commit__INFLIGHT__20240525000640000], [==>20240525005114829__commit__INFLIGHT__20240525011802000], [==>20240525065356540__commit__INFLIGHT__20240525071004000], [==>20240525170219523__commit__INFLIGHT__20240525192315000], [==>20240527184608604__commit__INFLIGHT__20240527190327000], [==>20240528190417601__commit__INFLIGHT__20240528192418000], [==>20240529054718316__commit__INFLIGHT__20240529060542000], [==>20240530125710177__commit__INFLIGHT__20240531081522000], [==>20240530234238360__commit__INFLIGHT__20240530234726000], [==>20240531082713041__commit__REQUESTED__20240531082715000], [==>20240601164223688__commit__INFLIGHT__20240601190853000], [==>20240602072248313__commit__INFLIGHT__20240603005951000], [==>20240603010859993__commit__INFLIGHT__20240603100305000], [==>20240604043334594__commit__INFLIGHT__20240604061732000], [==>20240605061406367__commit__REQUESTED__20240605061412000], [==>20240605063936872__commit__REQUESTED__20240605063943000], [==>20240605071904045__commit__REQUESTED__20240605071910000], [==>20240605074456040__commit__REQUESTED__20240605074502000], [==>20240605082437667__commit__REQUESTED__20240605082443000], [==>20240605085008272__commit__REQUESTED__20240605085014000], [==>20240605123632368__commit__REQUESTED__20240605123638000], [==>20240605130201503__commit__REQUESTED__20240605130207000], [==>20240605134213113__commit__REQUESTED__20240605134219000], [==>20240605140741158__commit__REQUESTED__20240605140747000], [==>20240605144756228__commit__REQUESTED__20240605144802000], [==>20240605151313557__commit__REQUESTED__20240605151319000], [==>20240605195405678__commit__REQUESTED__20240605195411000], [==>20240605202017653__commit__REQUESTED__20240605202023000], [==>20240605205949232__commit__REQUESTED__20240605205955000], [==>20240605212536568__commit__REQUESTED__20240605212542000], [==>20240605220432089__commit__REQUESTED__20240605220438000], [==>20240606152537217__commit__INFLIGHT__20240607031027000], [==>20240606181110800__commit__INFLIGHT__20240608000043000], [==>20240607112530977__commit__INFLIGHT__20240607212013000], [==>20240607213124841__commit__INFLIGHT__20240609024214000], [==>20240608001245366__commit__INFLIGHT__20240609045530000], [==>20240609030620894__commit__INFLIGHT__20240609180310000], [==>20240609181330488__commit__REQUESTED__20240609181336000], [==>20240609194304829__commit__INFLIGHT__20240611095337000], [==>20240611003906613__commit__INFLIGHT__20240611014341000], [==>20240611100258837__commit__INFLIGHT__20240612075536000], [==>20240611174425406__commit__INFLIGHT__20240611184626000], [==>20240612081821910__commit__INFLIGHT__20240612102427000], [==>20240612204659323__commit__REQUESTED__20240612204705000], [==>20240613044301243__commit__INFLIGHT__20240613075101000], [==>20240613085334404__commit__INFLIGHT__20240613105718000], [==>20240613113055212__commit__REQUESTED__20240613113101000], [==>20240613122745696__commit__REQUESTED__20240613122751000], [==>20240614094542418__commit__REQUESTED__20240614094548000], [==>20240614172456990__commit__REQUESTED__20240614172503000], [==>20240614175526954__commit__REQUESTED__20240614175529000], [==>20240614181441857__commit__REQUESTED__20240614181444000], [==>20240614222012190__commit__REQUESTED__20240614222015000], [==>20240614225952031__commit__REQUESTED__20240614225954000], [==>20240614235545094__commit__REQUESTED__20240614235547000]]

I was able to temporarily increase the hoodie.metadata.max.deltacommits.when_pending parameter but I think it's only a temporary band-aid.

Are there any CLI commands we can run to fix the metadata table? I've seen some previous instances of this failures where people just deleted the commits, but because we maintain so many tables, it's a bit hard to go into each one and run CLI and delete.

Sample Hudi write config we are using:

Map(hoodie.datasource.hive_sync.database -> nexus_amazonstore, hoodie.datasource.hive_sync.mode -> hms, hoodie.datasource.hive_sync.support_timestamp -> true, hoodie.datasource.write.precombine.field -> PrepareTime, hoodie.datasource.hive_sync.partition_fields -> ShipDayPartition,MarketplaceIdPartition,ChannelPartition, hoodie.datasource.write.payload.class -> com.amazon.nexus.datastore.merge.DatalakePartialUpdatePayload, hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.MultiPartKeysValueExtractor, hoodie.cleaner.fileversions.retained -> 60, hoodie.cleaner.parallelism -> 400, hoodie.datasource.hive_sync.table -> prepareoutput_shippingcost, hoodie.clean.automatic -> false, hoodie.datasource.write.operation -> upsert, hoodie.datasource.hive_sync.enable -> true, hoodie.datasource.write.recordkey.field -> EntityId,CustomerOrderItemId, hoodie.table.name -> prepareoutput_shippingcost, hoodie.write.lock.dynamodb.billing_mode -> PAY_PER_REQUEST, hoodie.datasource.write.table.type -> COPY_ON_WRITE, hoodie.datasource.write.hive_style_partitioning -> true, hoodie.write.lock.dynamodb.endpoint_url -> dynamodb.us-east-1.amazonaws.com, hoodie.write.lock.dynamodb.partition_key -> PrepareOutput-ShippingCost-NA-, hoodie.cleaner.policy -> KEEP_LATEST_FILE_VERSIONS, hoodie.database.name -> nexus_amazonstore, hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.ComplexKeyGenerator, hoodie.cleaner.policy.failed.writes -> LAZY, hoodie.write.lock.dynamodb.table -> DatastoreWriteLockTable, hoodie.write.lock.provider -> com.amazon.nexus.datastore.lock.DynamoDBBasedLockProvider, hoodie.datasource.write.partitionpath.field -> ShipDayPartition,MarketplaceIdPartition,ChannelPartition, hoodie.compaction.payload.class -> com.amazon.nexus.datastore.merge.DatalakePartialUpdatePayload, hoodie.write.concurrency.mode -> optimistic_concurrency_control, hoodie.write.lock.dynamodb.region -> us-east-1)

danny0405 commented 1 week ago

This is an known issue, probably because you have enabled async table service on data table, the 0.x Hudi metadata table does not work with any async table services, that would cause the MDT not compaction issue, and it is fixed on master now, with our new completion time based file slicing and non-blocking style concurrency control.

ad1happy2go commented 1 week ago

@Jason-liujc Did you checked your metadata table timeline? Do you see those many successful compaction commit recently? Do you see any blocking instant in timeline.

Jason-liujc commented 6 days ago

@danny0405 Ahh gotcha, we do have async cleaner that runs for our Hudi tables.

@ad1happy2go I don't see any compaction on metadata table since a given date (I believe that's when we moved Hudi cleaning from sync to async, based on Danny's comment). When I delete the metadata and try to reinitialize I do see this error, which I believe they are the blocking instants:

24/06/15 01:06:20 ip-10-0-157-87 WARN HoodieBackedTableMetadataWriter: Cannot initialize metadata table as operation(s) are in progress on the dataset: [[==>20240523221631416__commit__INFLIGHT__20240523224939000], [==>20240523225648799__commit__INFLIGHT__20240523232254000], [==>20240524111304660__commit__INFLIGHT__20240524142426000], [==>20240524235127638__commit__INFLIGHT__20240525000640000], [==>20240525005114829__commit__INFLIGHT__20240525011802000], [==>20240525065356540__commit__INFLIGHT__20240525071004000], [==>20240525170219523__commit__INFLIGHT__20240525192315000], [==>20240527184608604__commit__INFLIGHT__20240527190327000], [==>20240528190417601__commit__INFLIGHT__20240528192418000], [==>20240529054718316__commit__INFLIGHT__20240529060542000], [==>20240530125710177__commit__INFLIGHT__20240531081522000], [==>20240530234238360__commit__INFLIGHT__20240530234726000], [==>20240531082713041__commit__REQUESTED__20240531082715000], [==>20240601164223688__commit__INFLIGHT__20240601190853000], [==>20240602072248313__commit__INFLIGHT__20240603005951000], [==>20240603010859993__commit__INFLIGHT__20240603100305000], [==>20240604043334594__commit__INFLIGHT__20240604061732000], [==>20240605061406367__commit__REQUESTED__20240605061412000], [==>20240605063936872__commit__REQUESTED__20240605063943000], [==>20240605071904045__commit__REQUESTED__20240605071910000], [==>20240605074456040__commit__REQUESTED__20240605074502000], [==>20240605082437667__commit__REQUESTED__20240605082443000], [==>20240605085008272__commit__REQUESTED__20240605085014000], [==>20240605123632368__commit__REQUESTED__20240605123638000], [==>20240605130201503__commit__REQUESTED__20240605130207000], [==>20240605134213113__commit__REQUESTED__20240605134219000], [==>20240605140741158__commit__REQUESTED__20240605140747000], [==>20240605144756228__commit__REQUESTED__20240605144802000], [==>20240605151313557__commit__REQUESTED__20240605151319000], [==>20240605195405678__commit__REQUESTED__20240605195411000], [==>20240605202017653__commit__REQUESTED__20240605202023000], [==>20240605205949232__commit__REQUESTED__20240605205955000], [==>20240605212536568__commit__REQUESTED__20240605212542000], [==>20240605220432089__commit__REQUESTED__20240605220438000], [==>20240606152537217__commit__INFLIGHT__20240607031027000], [==>20240606181110800__commit__INFLIGHT__20240608000043000], [==>20240607112530977__commit__INFLIGHT__20240607212013000], [==>20240607213124841__commit__INFLIGHT__20240609024214000], [==>20240608001245366__commit__INFLIGHT__20240609045530000], [==>20240609030620894__commit__INFLIGHT__20240609180310000], [==>20240609181330488__commit__REQUESTED__20240609181336000], [==>20240609194304829__commit__INFLIGHT__20240611095337000], [==>20240611003906613__commit__INFLIGHT__20240611014341000], [==>20240611100258837__commit__INFLIGHT__20240612075536000], [==>20240611174425406__commit__INFLIGHT__20240611184626000], [==>20240612081821910__commit__INFLIGHT__20240612102427000], [==>20240612204659323__commit__REQUESTED__20240612204705000], [==>20240613044301243__commit__INFLIGHT__20240613075101000], [==>20240613085334404__commit__INFLIGHT__20240613105718000], [==>20240613113055212__commit__REQUESTED__20240613113101000], [==>20240613122745696__commit__REQUESTED__20240613122751000], [==>20240614094542418__commit__REQUESTED__20240614094548000], [==>20240614172456990__commit__REQUESTED__20240614172503000], [==>20240614175526954__commit__REQUESTED__20240614175529000], [==>20240614181441857__commit__REQUESTED__20240614181444000], [==>20240614222012190__commit__REQUESTED__20240614222015000], [==>20240614225952031__commit__REQUESTED__20240614225954000], [==>20240614235545094__commit__REQUESTED__20240614235547000]]

I guess my next questions are:

Is there a way to run compaction of the metadata table asynchrounously, without cleaning up commits, deleting metadata table and recreating them again? The process is a bit expensive and since based on what Danny said, the going forward metadata table compaction still won't work.
Also if we just increase the hoodie.metadata.max.deltacommits.when_pending parameter to say like 1000000, what type of performance hit would we expect it take? is it mostly on the S3 file listing level?

danny0405 commented 4 days ago

Did you check that whether data table has a long pending instant there that does not finish? Are there any other async table services on the data table then?

xushiyan commented 3 days ago

run compaction of the metadata table asynchrounously

no option to do that as MT compaction is managed internally

hoodie.metadata.max.deltacommits.when_pending parameter to say like 1000000

@Jason-liujc this is only a mitigation strategy. To get MT to compact, you need to resolve the pending commit (let it finish or rollback) on data table's timeline. if you email us the zipped .hoodie/ we can help analyze it.

Jason-liujc commented 12 hours ago

Had an offline discussion with Shiyan.

As long as the metadata table is not compacted properly, the insertion performance will become worse and worse gradually.

Here’s some action items we are taking:

For future Hudi issues, we’ll try to create github issues first. I’ll create another one for some incremental query errors (but its totally mitigable on our end)
For this specific issue on metadata table not being compacted, we’ll try the following a. Run scripts to delete previous uncommitted instants (and any files created if any) and see if the metadata compaction resumes b. Run workload with synchronous cleaning to see if it can compact the metadata table c. After cleaning up pending commits, see if we successfully reinitialize the metadata table

Will give an update here on how it goes

danny0405 commented 5 hours ago

@Jason-liujc Thanks for these tries, but from high-level, we should definitely simplify the design of the MDT, at least from 1.x, the MDT compaction can work smothly with any async table service now, the next step is to make it NB-CC totally.

apache / hudi

[SUPPORT] Hudi Metadata Compaction is not happening #11535