Closed veenaypatil closed 2 years ago
cc @vinothchandar @xushiyan
@YannByron Thanks for all the great contributions! do you have any clues here? :)
@veenaypatil
To confirmed, you use KEEP_LATEST_COMMITS
as the policy of cleaner, and set CLEANER_COMMITS_RETAINED
to 120?
Or, you can show the all options about cleaner.
@YannByron that's right, these are the hoodie configs set for the streaming job
hoodieConfigs:
hoodie.datasource.write.operation: upsert
hoodie.datasource.write.table.type: MERGE_ON_READ
hoodie.datasource.write.partitionpath.field: ""
hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.NonpartitionedKeyGenerator
hoodie.datasource.hive_sync.partition_extractor_class: org.apache.hudi.hive.NonPartitionedExtractor
hoodie.parquet.max.file.size: 6110612736
hoodie.compact.inline: true
hoodie.compact.inline.max.delta.seconds: 3000
hoodie.commits.archival.batch: 5
hoodie.clean.automatic: true
hoodie.clean.async: true
hoodie.cleaner.policy: KEEP_LATEST_COMMITS
hoodie.cleaner.commits.retained: 120
hoodie.keep.min.commits: 130
hoodie.keep.max.commits: 131
Once the user migrated the code to Spark3 the ETL is running fine, seems like an issue with Spark2 caching then
@veenaypatil which spark 2.x version you used exactly? Hudi supports 2.4+
@xushiyan we were on 2.3.2 version on older cluster, on the new one it is 3.0.2 where it worked. I am closing this issue as the ETL is working is working after migrating to 3.x spark version
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
We are getting the following error in Production for one of the end users ETL's
We had faced the same issue earlier but we mitigated it by increasing cleaner commits to 120 in spark streaming job which is writing to this location, For reference the spark streaming job has a batch interval of 10 mins where on an avg. the batches are completing in 4 mins and compaction takes 40-50mins which is triggered after 4 commits, so roughly we have around 8hrs of commits.
User Is running the ETL on spark 2.x which is combination of Spark-SQL and Spark-core
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version : 0.8
Spark version : 2.x
Hive version : 3.x
Hadoop version : 2.7
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Additional context
Add any other context about the problem here.
The above configs are of older cluster where the ETL ran. All other ETL's running on Spark3 and using Hive3 are running fine , as mentioned earlier where we had increased the cleaner commits, one of the ETL's had failed on newer cluster as well but post increasing the cleaner commits configs it has not failed on new cluster.
Stacktrace