apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.34k stars 2.42k forks source link

[SUPPORT] FileNotFoundException when clustering #10601

Open echisan opened 8 months ago

echisan commented 8 months ago

Tips before filing an issue

Describe the problem you faced

I am not sure why the parquet file is missing, flinkjob did not restart. I would like to know how to handle this issue. Is it possible to ignore the missing file?

To Reproduce

Steps to reproduce the behavior:

1.Set up a FlinkSQL job with Kafka as the data source. 2.Configure the job to write data into a Hudi Cow table with online clustering. 3.Execute the job.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

CREATE  TABLE ods_mqtt_msg(
  ....
  dt STRING,
  PRIMARY KEY (`field1`, `field2`, `field3`) NOT ENFORCED
)
PARTITIONED BY (`dt`)
WITH (
  'connector' = 'hudi',
  'table.type' = 'COPY_ON_WRITE',
  'path' = 's3a:///lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg',
  'write.operation' = 'INSERT',
  'clustering.async.enabled' = 'true',
  'clustering.schedule.enabled' = 'true',
  'hive_sync.enable' = 'true',
  'hive_sync.mode' = 'hms',
  'hive_sync.metastore.uris' = 'thrift://hive-metastore-svc.hms.svc:9083',
  'read.streaming.enabled' = 'true',
  'write.tasks' = '4'
);

Stacktrace

2024-02-01 03:56:01,519 INFO  org.apache.hudi.client.HoodieFlinkWriteClient                [] - Cleaner has been spawned already. Waiting for it to finish
2024-02-01 03:56:01,519 INFO  org.apache.hudi.async.AsyncCleanerService                    [] - Waiting for async clean service to finish
2024-02-01 03:56:01,627 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[==>20240201035304158__commit__INFLIGHT]}
2024-02-01 03:56:02,333 INFO  org.apache.hudi.common.util.ClusteringUtils                  [] - Found 658 files in pending clustering operations
2024-02-01 03:56:02,333 INFO  org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView [] - Sending request : (http://169.122.153.67:46386/v1/hoodie/view/compactions/pending/?basepath=s3a%3A%2Flakehouse%2Fhudi%2Fdevice_mqtt_msg%2Fods_mqtt_msg&lastinstantts=20240201035302997&timelinehash=350fb15b2282717446dd396f06ebaf80257ed284589ba906e5c3ccf6701cc223)
2024-02-01 03:56:02,427 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Checking for file exists ?s3a:/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/.hoodie/20240131195932968.replacecommit.requested
2024-02-01 03:56:02,564 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Create new file for toInstant ?s3a:/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/.hoodie/20240131195932968.replacecommit.inflight
2024-02-01 03:56:02,677 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[20240201035304158__commit__COMPLETED]}
2024-02-01 03:56:02,677 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Execute clustering plan for instant 20240131195932968 as 17 file slices
2024-02-01 03:56:02,937 ERROR org.apache.hudi.sink.clustering.ClusteringOperator           [] - Executor executes action [Execute clustering for instant 20240131195932968 from task 2] error
org.apache.hudi.exception.HoodieClusteringException: Error reading input data for s3a://xxx-bucket/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/2024-01-31/85040fcd-3f42-4b37-865f-616fc0ad3df8-0_1-4-0_20240131164655396.parquet and []
    at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:332) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at java.lang.Iterable.spliterator(Unknown Source) ~[?:?]
    at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$readRecordsForGroupBaseFiles$5(ClusteringOperator.java:336) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) ~[?:?]
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source) ~[?:?]
    at java.util.stream.AbstractPipeline.copyInto(Unknown Source) ~[?:?]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) ~[?:?]
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source) ~[?:?]
    at java.util.stream.AbstractPipeline.evaluate(Unknown Source) ~[?:?]
    at java.util.stream.ReferencePipeline.collect(Unknown Source) ~[?:?]
    at org.apache.hudi.sink.clustering.ClusteringOperator.readRecordsForGroupBaseFiles(ClusteringOperator.java:337) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at org.apache.hudi.sink.clustering.ClusteringOperator.doClustering(ClusteringOperator.java:237) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$processElement$0(ClusteringOperator.java:189) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
    at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://xxx-bucket/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/2024-01-31/85040fcd-3f42-4b37-865f-616fc0ad3df8-0_1-4-0_20240131164655396.parquet
    at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1931) ~[hadoop-aws-2.9.2.jar:?]
    at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1822) ~[hadoop-aws-2.9.2.jar:?]
    at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1763) ~[hadoop-aws-2.9.2.jar:?]
    at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:337) ~[flink-sql-parquet-1.16.2.jar:1.16.2]
    at org.apache.hudi.io.storage.HoodieAvroParquetReader.getIndexedRecordIteratorInternal(HoodieAvroParquetReader.java:168) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at org.apache.hudi.io.storage.HoodieAvroParquetReader.getIndexedRecordIterator(HoodieAvroParquetReader.java:94) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at org.apache.hudi.io.storage.HoodieAvroParquetReader.getRecordIterator(HoodieAvroParquetReader.java:73) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:329) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
    ... 16 more

image

ad1happy2go commented 8 months ago

@echisan Can you please share the hoodie timeline to look into this further.