BauplanLabs / no-jvm-wap-with-iceberg

A write-audit-publish implementation on a data lake without the JVM
MIT License
38 stars 2 forks source link

General Question #4

Open georgezefko opened 1 month ago

georgezefko commented 1 month ago

Hey guys,

Thank you very much for the customised patches you provided. I tested it out and with some adjustments to my needs works very good.

However, I was wondering if REST catalogue could be use instead and If you have tried that out I would like to know your experiences.

So far I haven't managed to configure it successfully with nessie without spark.

Thanks again for your contribution.

jacopotagliabue commented 1 month ago

hi @georgezefko, thanks for using this. This OS project / reference implementation predates Nessie update (that made it compatible with the REST API), but internally at Bauplan we made some progress in standardization of the APIs in the platform.

@russellromney can definitely chime in with some of our experience - if you want to see how things work in the platform, ping us anytime.

georgezefko commented 1 month ago

@jacopotagliabue Thanks for the quick response.

Have you adjusted your approach to use REST catalogue you mean?

I would be interested to see how you work with, for learning purposes.

jacopotagliabue commented 1 month ago

yeah, inside the platform we moved to a more generic support, even if not all Nessie's features map 1:1 to the REST spec v1, so there still some manual API calls to do etc. Will let @russellromney chime in with some "gotchas" if he can think of anything useful to share from our own implementation!

russellromney commented 1 month ago

@georgezefko thanks for your question! We've recently moved entirely from pynessie to Nessie's catalog REST API (https://app.swaggerhub.com/apis/projectnessie/nessie/0.92.1) for all Nessie catalog operations (except table writes) as we've found it to be more reliable and well-documented. It's been a very good experience overall. The only "gotcha" is, you have to be careful about using a "ref" (branch_name@commit_hash) vs "branch" (branch_name), and careful about passing around table names with or without namespaces. For table writes we've started using Nessie's pyiceberg REST implementation which is available at /iceberg and allows us to consolidate security privileges. I can't show you "real" examples but the API documentation is excellent.