Open zyclove opened 10 months ago
@zyclove I tried the same scenario but it was working fine for me and queries were running fine. Can you try the same in your setup. Is this only happening for one table?
CREATE TABLE hudi_table (
ts BIGINT,
uuid STRING,
rider STRING,
driver STRING,
fare DECIMAL(10,4),
city STRING
) USING HUDI
tblproperties (
type = 'mor', primaryKey = 'uuid', preCombineField = 'ts'
)
PARTITIONED BY (city);
INSERT INTO hudi_table
VALUES
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',100001.0001,'san_francisco'),
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',100001.0001 ,'san_francisco');
INSERT INTO hudi_table
VALUES
(1695159649089,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',100001.0001,'san_francisco'),
(1695091554790,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',100001.0001 ,'san_francisco');
INSERT INTO hudi_table
VALUES
(1695159649091,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',100001.0001,'san_francisco'),
(1695091554790,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',100001.0001 ,'san_francisco');
SELECT count(1) FROM
hudi_table_changes('hudi_table_rt', 'latest_state', '20231114033500000', '20231116152700000');
SELECT count(1) FROM
hudi_table_changes('hudi_table_rt', 'latest_state', '20231114033500000', '20231116152700000');
SELECT count(1) FROM
hudi_table_changes('hudi_table_rt', 'latest_state', '20231114033500000', '20231116152700000');
I can not reproduce it too.
--executor-memory 4G --executor-cores 2
this can be too small for a large table hudi_table_changes is just a wrapper on top of spark data source incremental query, do you see the same issue with it as well?
Describe the problem you faced
To Reproduce
Steps to reproduce the behavior:
1. spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.14.0 --master yarn --driver-memory 8g --num-executors 10 --conf spark.dynamicAllocation.maxExecutors=20 --executor-memory 4G --executor-cores 2 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog --conf spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar --conf spark.sql.autoBroadcastJoinThreshold=2G --conf spark.memory.storageFraction=0.5 --conf spark.sql.broadcastTimeout=60000 --conf spark.yarn.priority=5 --conf spark.sql.broadcastTimeout=600000 --conf spark.network.timeout=600000s --conf spark.eventLog.enable=false --conf spark.driver.maxResultSize=4g --conf spark.driver.extraJavaOptions=-XX:-UseGCOverheadLimit --conf spark.executor.extraJavaOptions=-XX:-UseGCOverheadLimit --name zyc_test --conf spark.dynamicAllocation.enabled=false
2.SELECT count(1) FROM hudi_table_changes('bi_ods_real.ods_log_smart_datapoint_report_batch_rt', 'latest_state', '20231114033500000', '20231114040500000');
Results can be returned normally
It's stuck and won't exit. It's hard to exit by Ctrl+C.
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :0.14.0
Spark version :3.2.1
Hive version :3.1.3
Hadoop version :3.2.2
Storage (HDFS/S3/GCS..) :s3
Running on Docker? (yes/no) :no