Open NishantBaheti opened 4 months ago
Hi, @NishantBaheti , thanks for your feedback, could you also supplement the release version for Spark and Hudi respectively.
Hello, I am using this jar
@NishantBaheti I checked before, incremental query works fine with 0.14.1. can you paste the full reproducible script or table/writer properties you used to populate. I checked the below code to quickly to reproduce - https://gist.github.com/ad1happy2go/e7a2f8c695fde4c3db060a7113610931
@NishantBaheti I checked before, incremental query works fine with 0.14.1.
can you paste the full reproducible script or table/writer properties you used to populate. Which writer you used to populate this table?
I checked the below code to quickly to reproduce - https://gist.github.com/ad1happy2go/e7a2f8c695fde4c3db060a7113610931
doesn't work. another issue.
@NishantBaheti Were you able to get it resolve? Can you let us know full stack trace. Looks like Unable to load class means some library conflicts.
@ad1happy2go moved to the MOR table. COW configurations felt a little unstable. had to rush the project to production quickly.
@NishantBaheti Thanks for the update. Surprisingly MOR worked but COW didn't work you.
@ad1happy2go COW tables were failing a lot, like at the time of reading parquet file not found, no incremental query or getting the error mentioned above. Not saying that MOR is perfect but still had to put something in production with static configurations of MOR with quick compact cleaner so that athena ro
tables behave like delta tables from delta framework and should be able to do point query using record index. I hope they figure out a stable version of hudi soon like how delta did.
@NishantBaheti For incremental queries we can face FileNotFound Exception if the file for that query got deleted by the cleaner. We can set hoodie.datasource.read.incr.fallback.fulltablescan.enable to true to get around this issue.
Error
Error Category: QUERY_ERROR; AnalysisException: Found duplicate column(s) in the data schema:
_hoodie_commit_seqno
,_hoodie_commit_time
,_hoodie_file_name
,_hoodie_partition_path
,_hoodie_record_key
Code
hudi_options={ 'hoodie.datasource.query.type': 'incremental', 'hoodie.datasource.read.begin.instanttime': start_time, 'hoodie.datasource.read.end.instanttime': end_time, } df=spark.read\ .format("org.apache.hudi")\ .options(**hudi_options)\ .load(tablePath)