YotpoLtd / metorikku

A simplified, lightweight ETL Framework based on Apache Spark
https://yotpoltd.github.io/metorikku/
MIT License
583 stars 155 forks source link

Is metorikku support hudi incremental pull as input? #394

Closed SpeedxPz closed 3 years ago

SpeedxPz commented 3 years ago

I'm new at data warehouse and currently using Metorikku for streaming CDC from Kafka and sink into the data lake as Hudi

I have to do the ETL process after that Can Metorikku do incremental pull from the Hudi?

Thanks

lyogev commented 3 years ago

In general you have a lot of metadata fields to indicate the commit time/id of each row in the hudi table. So you could WHERE these to get the last commit or something (you will need to know what you are looking for though)