Open ghost opened 2 years ago
@awpengfei yes, for now it is a expected behavior, before call any hudi function, hive will filter out all files which start with '.' so all the log files are filtered out.
you have two way to sovle this problem
1: trigger compaction, after compaction parquet file will generate
2: modify hive souce code, not filter out .log files
maybe it can help you @awpengfei @xiarixiaoyao modify source code like this: org.apache.hudi.hadoop.HoodieParquetInputFormat
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat
is this written using flink? using spark, we won't create log files directly. data files will be created first and then only log files will be created for file groups.
but might be good to get it fixed irrespective of that. just trying to guage the use-case.
@xiarixiaoyao : Alexey did a revamp of all query engine code paths recently w/ 0.11. Do we have this issue even now after 0.11? do you have any idea. do we have a tracking ticket for this.
@nsivabalan i donot think 0.11 can solve this problem. @CrazyBeeline thanks for your help. could you pls raise a pr to solve this problem, thanks very much
@CrazyBeeline : can you put up a patch w/ the fix you have. Happy to review and get it to landing. btw, are you using hbase or some other set up. wondering how did you end up w/ a file group w/ log file but w/o a base file.
@CrazyBeeline : gentle ping.
@danny0405 @xiarixiaoyao : do we know if we have fixed this anytime.
No, we have not fixed it, the Hive/Trino all can not access file group with pure logs, can we move it to higher priority for release 0.13.0 and solve it then ?
@ad1happy2go to reproduce
Describe the problem you faced When using Hive to query xxx_rt table,if there is no parquet file but only log files, we get a wrong table path. But when the parquet files are generated, the table path is correct and we can get the data. Is this expected behavior?
Environment Description
Hudi version : 0.10.1
Hive version : 3.1.2
Hadoop version : 3.3.1