[SUPPORT] Wrong table path when using Hive to query xxx_rt table before the first compaction

apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.

https://hudi.apache.org/

Apache License 2.0

5.33k stars 2.42k forks source link

[SUPPORT] Wrong table path when using Hive to query xxx_rt table before the first compaction #4978

Open ghost opened 2 years ago

ghost commented 2 years ago

Describe the problem you faced When using Hive to query xxx_rt table，if there is no parquet file but only log files, we get a wrong table path. But when the parquet files are generated, the table path is correct and we can get the data. Is this expected behavior?

ERROR : Job failed with java.io.FileNotFoundException: File does not exist: hdfs://da-hdfs/tmp/hive/hadoop/90b7d231-0e0a-42e5-a72a-6faad6a9ac89/.hoodie
org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path hdfs://da-hdfs/tmp/hive/hadoop/90b7d231-0e0a-42e5-a72a-6faad6a9ac89/.hoodie
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://da-hdfs/tmp/hive/hadoop/90b7d231-0e0a-42e5-a72a-6faad6a9ac89/.hoodie

Environment Description

Hudi version : 0.10.1
Hive version : 3.1.2
Hadoop version : 3.3.1

xiarixiaoyao commented 2 years ago

@awpengfei yes, for now it is a expected behavior, before call any hudi function, hive will filter out all files which start with '.' so all the log files are filtered out.
you have two way to sovle this problem 1: trigger compaction, after compaction parquet file will generate 2: modify hive souce code, not filter out .log files

CrazyBeeline commented 2 years ago

maybe it can help you @awpengfei @xiarixiaoyao modify source code like this: org.apache.hudi.hadoop.HoodieParquetInputFormat