Closed lukas-zeman-ABSA closed 2 months ago
We had such an implementation, actually 😄. It was quite slow, so we removed it. But it was a couple of years ago. Maybe now is a good time to revive it.
Found classes for Delta. I want to restore them in next Pramen version. Just, currently, it uses Delta paths, not tables. This is because it requires several different subpaths to save different stuff. Do you want to add Delta Lake table support or a path is fine?
Well maybe we could make it work at databricks with just path, but saveAsTable would be much better. (It would improve speed and also allow us to store this data in databricks managed tables)
Got it, will add support for tables
Just want also to clarify that Pramen is going to use several tables for bookkeeping, So when this is implemented, you can specify the database and table prefix for Delta Table configuration.
Somethting like:
pramen {
bookkeeping.enabled = true
bookkeeping.delta.database = "my_db"
bookkeeping.delta.table.prefix = "bk_"
}
Let me know if this is okay for you.
chcecked the implementation. Yes this would work totally fine, thanks. Theoretically database here means "catalog.schema" but will work :)
Add support of delta table for bookkeeper. Could be used to maintain metastore in databricks.
https://github.com/AbsaOSS/pramen/blob/c0bc31219fdcbfe9398cf6a2f0e414278712ec55/pramen/core/src/main/scala/za/co/absa/pramen/core/bookkeeper/Bookkeeper.scala#L110