Open neerajpadarthi opened 3 months ago
@neerajpadarthi Is it possible to upgrade your hudi version. Or If using 0.11.0 can you disable timeline server.
"hoodie.embed.timeline.server", "false"
Hi @ad1happy2go, thanks for checking.
@neerajpadarthi suggest upgrade to EMR 7.1 with hudi 0.14 amzn version which is closest to OSS 0.15.
Hi @xushiyan, Upgrading to Hudi 0.12 or higher is a significant change for our customers because it does not support redshift integration and also requires us to check the integrations with other services/usecases.
Could you kindly explain the potential effects of disabling the timeline server during writes, as suggested by @ad1happy2go in 0.11? Based on this, we will make the trade-off decision. Kindly let us know. Thanks
@neerajpadarthi You can go through this blog written by @nsivabalan
https://medium.com/@simpsons/timeline-server-in-apache-hudi-b5be25f85e47
Thanks for sharing this link. In our case, the upserts are relatively small, affecting only a few files (10–100's), so from the link's benchmarking details we should be relatively good when using the direct markers. However, for the first bulk insert load, we will enable the timeline server to utilize the timeline batches and prevent any S3 throttling errors.
@ad1happy2go - Is the timeout issue fixed in 0.14V? We will evaluate once the redshift supports 0.14 v. Please let us know. Also, I am observing below difference with the indexing stage before and after the marker change during upserts. Can you please help me understand the runtime difference? I am guessing that the "Load latest base files from all partitions" job finished quickly with direct market as it didn't have the overhead of setting the timeline server, but I see significant time taking with this job, "Obtain key ranges for file slices (range pruning=on)" after disabling the timeline marker server. Can you please help me understand the difference during the range pruning with direct. vs. timeline marker server?
timelineserver marker
direct market
I am facing similar issue. I am using Glue 4.0 with Hudi 0.14. The very first run when the table does not exist in Glue catalog works fine. However, over time, the glue job hangs and never finishes. I see errors similar to below.
"Failure Reason": "Error checking presence of partition meta file for s3a:
Since multiple jobs are writing to same table, I have enabled locks and below is the hudi config I am using
"hudi_options" : {
'hoodie.table.cdc.enabled':'true',
'hoodie.table.cdc.supplemental.logging.mode': 'data_before_after',
'hoodie.datasource.write.recordkey.field': 'uuid',
'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator',
'hoodie.table.name': "transact_table",
'hoodie.datasource.write.table.name': "transact_table",
'hoodie.datasource.hive_sync.table': "transact_table",
'hoodie.datasource.hive_sync.database': "default",
'hoodie.datasource.write.partitionpath.field': 'a,b,c',
'hoodie.datasource.hive_sync.partition_fields': 'a,b,c',
'hoodie.datasource.write.hive_style_partitioning': 'true',
'hoodie.datasource.hive_sync.enable': 'true',
'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
'hoodie.datasource.write.storage.type': 'COPY_ON_WRITE',
'hoodie.datasource.write.operation': 'upsert',
'hoodie.datasource.write.precombine.field': 'uuid',
'hoodie.datasource.hive_sync.use_jdbc': 'false',
'hoodie.datasource.hive_sync.mode': 'hms',
'hoodie.datasource.hive_sync.support_timestamp': 'true',
'hoodie.write.concurrency.mode': 'OPTIMISTIC_CONCURRENCY_CONTROL',
'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
'hoodie.write.lock.dynamodb.table': 'hudi_locks_table',
'hoodie.write.lock.dynamodb.region': 'us-west-2',
'hoodie.write.lock.dynamodb.endpoint_url': 'dynamodb.us-west-2.amazonaws.com',
'hoodie.write.lock.dynamodb.partition_key': "transact_table",
'hoodie.cleaner.policy.failed.writes':'LAZY',
'hoodie.keep.min.commits': 10,
'hoodie.keep.max.commits': 20,
'hoodie.cleaner.commits.retained': 9
}
I did disable timeline server and still job hangs and doesnt succeed/fail.
@keerthiskating Do we know which stage is getting stuck, Did you checked the driver logs and spark UI ?
I haven't checked the Spark UI for the job where I was trying to read the metadata table. But for the actual job that is upserting to the hudi table, it is getting stuck at Preparing compaction metadata step.
@keerthiskating Looks like we are already discussion about this here - https://github.com/apache/hudi/issues/11712
Describe the problem you faced
I encountered an issue using EMR 6.7 with Hudi Version 0.11.0 where the Hudi upsert job did not terminate gracefully and remained stuck in a running state indefinitely, despite the upsert operation being completed. Seems like issues with the post-commit actions but forcefully terminating the job and retrying succeeded. Below provided are the configurations and stacktrace details.
Hudi Configs
Stacktrace
Add the stacktrace of the error.
Please check and let me know if you need any additional details. Thanks