Open jonathantransb opened 7 months ago
@jonathantransb Thanks for raising this, Sorry for the delay here. In case you tried, Are you facing this issue with OSS hudi and without glue catalog? I will try to check out that and get back to you.
@ad1happy2go Thank you for handling this. I haven't tried the settings without the Glue catalog yet.
@jonathantransb Sorry for delay here, I missed it. Will work on this and get back to you soon.
@jonathantransb Are you still facing this issue? Is it possible to hop into a call to understand this better. I did tried jobs with 0.14 but didn't able to reproduce any such issue.
Describe the problem you faced
I'm attempting to read a Hudi table on Glue Catalog using SparkSQL with metadata enabled. However, my job appears to hang indefinitely at a certain step. Despite enabling DEBUG logs, I'm unable to find any indications of what may be causing this issue. Notably, this problem only occurs with Hudi tables where
clean
is the latest action in the timeline.To Reproduce
Steps to reproduce the behavior:
Create a Hudi table where
clean
is the latest action in the timelineOpen spark-shell
Run spark.sql():
Expected behavior
Spark job can read the table without hanging
Environment Description
Hudi version : 0.14.0
Spark version : 3.4.1
Hive version : 2.3.9
Hadoop version : 3.3.6
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : yes
Additional context
I encountered no issues while using Hudi version 0.13.1. However, upon trying the new Hudi 0.14.0 version, I experienced this problem.
The driver pod consistently uses up to 1 CPU core, although I'm uncertain about the processes that are running:
For tables where
commit
is the latest action in the timeline, Hudi 0.14.0 can read the table without any hanging issues.Stacktrace