Open edmondop opened 1 week ago
@alamb since the linked PR is closed, can we mark this as closed?
This is new, and although related to the linked issues, it is not closed
For additional context see https://github.com/apache/arrow-rs/issues/5143. Copying some of the info here:
I think the usecase this feature would support is
polars
polars
does not provide any way to modify / configure s3 connections at runtimeSince the users don't control the pola.rs source or distribution, they can not use the existing object_store CredentialProvider
trait.
The proposal on this ticket is to add an mechanism that can call out to an external program / process to get credentials. While less efficient this would allow someone to plug in whatever authentication mechanism they wanted without having to change the source code
@tustvold notes that we need to ensure this type of mechanism does not compromise system security (e.g. perhaps it has to be enabled by deafult
Also, he mentioned that the Azure client has something similar -- MicrosoftAzureBuilder::with_use_azure_cli
that we could use as a model
See also related ticket in pola-rs that @tustvold filed: https://github.com/pola-rs/polars/issues/18979
TBC I view this very much as a hack around an API limitation in Polars, I would prefer we try to fix this there before resorting to this - https://github.com/pola-rs/polars/issues/18979#issuecomment-2381289706
Can you explain why you think credential process support related to Polars? To me is a gap in AmazonS3Builder
in object store
Your original request concerned supporting a broader range of auth within the context of polars. Credential process support was then proposed as a way to workaround the inability to override the credential configuration within polars. By fixing this limitation of polars we not only provide a way for users to use credential process, via an AWS SDK that implements it, but also the full flexibility of all the other auth possibilities exposed by these SDKs.
I'd naturally prefer the solution that gives users the most flexibility and avoids needing to revisit this again when someone comes along requesting SSO or similar
I'd naturally prefer the solution that gives users the most flexibility and avoids needing to revisit this again when someone comes along requesting SSO or similar
I would also like to avoid a similar conversation with users of systems other than polars.
I agree in an ideal world, perhaps polars would implement a user APIs to fully configure S3 auth via the object_store
using the existing APIs.
Even if they did this, however, I think we will continue to have similar conversations with other downtream users
I view this credential process not as a hack but a general purpose configuration convention that works for any subsequent user (similarly to how object_store also supports the standard AWS configuration environment variables without any downstream crate configuration
Even if they did this, however, I think we will continue to have similar conversations with other downtream users
I think this proposal doesn't meaningfully help in this regard because:
So we'd end up with users continuing to come here asking about this, and we'd have to direct them to some object_store specific environment configuration to call out to some external process they have to setup.
Encouraging the downstreams to expose, or otherwise utilize the object_store credential provider API would avoid this entirely.
Credential process is a flexible solution for providing custom authentication mechanisms for object store. It is described as a part of the AWS SDK documentation and implementing it would allow more complex use cases to be fully supported by the current setup, without adding particular complexity.
How does it work?
When user decides to use the credential process, when a client needs credentials it invokes the process, which replies with a defined schema like so:
The client knows when the expiration will occur, and will re-invoke the process when required.
What can we do?
We can then extend the AmazonS3Builder to support this use case via an environment variable