Closed tdikland closed 6 months ago
Hi Tim, thank you for submitting this PR!
I haven't delved into the details of the PR yet, but it looks great. That aside, I like your idea of the Delta Sharing Server SDK, the delta-sharing-server-rs project. Since the Delta Sharing protocol is defined through REST APIs, there may be some considerations to keep in mind; nevertheless, the idea is interesting.
As for the cloud-file-signer crate, I believe it would be more beneficial to expand it into a larger project, similar to the delta-sharing-server-rs. If your contribution here serves any purpose, please feel free to incorporate it.
@tdikland
Regarding the open questions, here are my answers:
Thanks for your quick reaction and enthusiasm @ognis1205.
Regarding the delta-sharing-server-rs
project, I think it would be great for users to have both the lib that implements the delta sharing protocol and the batteries included version (which this crate is currently focusing on). The user journey that I'm thinking about is a user starts by deploying the reference implementation of the delta sharing server and when customisation is needed they can take the lib and plugin custom auth, retry, other middleware, different database etc. It may be nice to support this user journey with this project. I will open a new issue to discuss this in more depth.
Regarding the support for the underlying object stores (i.e. AWS, Azure, GCP). Thanks for your insights on my questions. I will be iterating on the implementation and let you know when it's ready for review.
I am very grateful for your contribution @tdikland !
This is a quick review of the current commit. Regarding the 'cloud-file-signer' crate, I believe there is operational experience in your project, but I think it still needs a bit more incubation. Therefore, I believe it's still early to use the crate. Since this project itself is also in incubation, considering the future development speed, I would prefer to have the implementation of signed URLs for each cloud provider done here as much as possible. The experience gained here can then be reflected in 'cloud-file-signer,' and eventually, I would like to use the crate in 'delta-sharing-rs' as well.
Please do not remove utilities::signed_url
(or rename it utilities::cloud_signer
). Instead, can we consider implementing factory methods to instantiate the cloud signer instance from the credentials of each provider (similar to what is done in the bootstrap
module)? I would like a factory method to be added to the struct Utility
in utilities::signed_url
(or utilities::cloud_signer
) module. The reason is that having information in the form of logging, indicating which provider's credentials were loaded during server bootstrap and the design and the implementation would be more consistent with other modules. What do you think?
@tdikland
Regarding the 'cloud-file-signer' crate, even if, for instance, we are not currently using the crate, I am considering stating in the README that we anticipate potential future use of it as a related project.
Thanks for the feedback. I agree it is not yet desirable to depend on the cloud-file-signer
crate. I was experimenting with it and accidentally pushed it to this PR (sorry!). I'll give a heads-up once it is ready for review!
You don't have to apologize for it at all! I apologize if I have caused any disruption to the development. We should have prepared a more accessible system, such as a Code of Conduct (COC), to make participation easier. It was very helpful that you made various suggestions and asked questions.
Allright, I think I got it down now.
Thank you for your cooperation @tdikland :)
I believe we have reached a consensus on the general direction of the implementation. It seems there are still some parts that I would like to discuss though, before proceeding to the review, let's go ahead and run some tests for now. @roeap could you please approve the workflow? (I deliberately commented out the #[tokio::test]
attribute in some unit tests, but I might have added comments to explain. It's my mistake. Let's discuss it later as well)
When you are ready to review, please remove [WIP] from the title. Or do you prefer to conduct detailed reviews gradually?
Hi @ognis1205, @roeap, I'm ready for reviewing.
acquire pre-signed URls
We already support this for S3. Azure and GCP are more complex but would be happy for contributions in this space - https://docs.rs/object_store/latest/object_store/signer/trait.Signer.html
@tdikland @roeap I will review this PR in the next 12 hours, here in Tokyo, it's around 24:00 midnight.
I did not see the discussion around delta-sharing-server-rs, but in the past @ognis1205 and I had discussed that we want to break this up into smaller crates. and split the different "sub-apis" such that one can easily mix-and-match a server that supports the sharing protocol with just the desired features.
Is there any place to discuss this issue? I have some opinions on this since the protocol is defined over REST APIs. I think this project might be helpful:
https://github.com/HeroicKatora/oxide-auth
Depending on the conclusion of our discussion, in some cases, I think it would be acceptable to change the name of this project to something like "delta-sharing-sdk" and have @tdikland and @roeap take the lead in managing the project. This would allow for more efficient development, I believe.
FWIW I've created two upstream tickets on object_store for adding support for Azure and GCP, along with links to the relevant docs. I think they should be relatively straightforward if anyone wanted to pick them up:
@tustvold - i know the azure apis quite well, and signing using keys should (hopefully) be straignforward to leverage the existing signing using the master key ... user delegated are just API cal li think :)
had a quick glance at GCP - might do that second - seems similar, cannonicalize request and sign ...
@tdikland - just FYI, started to add url signing for azure to object store https://github.com/apache/arrow-rs/pull/5259.
@roeap @ognis1205 - FYI I have created two new issues to keep track of the discussion and next steps.
This PR introduces support for sharing Delta Lake tables stored on Azure Blob Storage.
I've marked the PR as WIP, since I feel there are still a couple of open questions that we may want to address in this or subsequent issues.
On a higher level:
https://crates.io/crates/cloud-file-signer
. This crate defines aCloudFileSigner
trait and implements it for a couple of popular object stores (S3, Azure Blob Storage, GCS) and could be extended to other stores (R2, MinIO, etc.)Looking forward to some feedback!
Closes #18