apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.51k stars 746 forks source link

[object-store]: Implement credential_process support for S3 #6422

Open edmondop opened 1 week ago

edmondop commented 1 week ago

Credential process is a flexible solution for providing custom authentication mechanisms for object store. It is described as a part of the AWS SDK documentation and implementing it would allow more complex use cases to be fully supported by the current setup, without adding particular complexity.

How does it work?

When user decides to use the credential process, when a client needs credentials it invokes the process, which replies with a defined schema like so:

{
    "Version": 1,
    "AccessKeyId": "an AWS access key",
    "SecretAccessKey": "your AWS secret access key",
    "SessionToken": "the AWS session token for temporary credentials", 
    "Expiration": "RFC3339 timestamp for when the credentials expire"
}  

The client knows when the expiration will occur, and will re-invoke the process when required.

What can we do?

We can then extend the AmazonS3Builder to support this use case via an environment variable

ByteBaker commented 1 week ago

@alamb since the linked PR is closed, can we mark this as closed?

edmondop commented 1 week ago

This is new, and although related to the linked issues, it is not closed

alamb commented 5 days ago

For additional context see https://github.com/apache/arrow-rs/issues/5143. Copying some of the info here:

I think the usecase this feature would support is

  1. User uses object_store indirectly via polars
  2. polars does not provide any way to modify / configure s3 connections at runtime

Since the users don't control the pola.rs source or distribution, they can not use the existing object_store CredentialProvider trait.

The proposal on this ticket is to add an mechanism that can call out to an external program / process to get credentials. While less efficient this would allow someone to plug in whatever authentication mechanism they wanted without having to change the source code

@tustvold notes that we need to ensure this type of mechanism does not compromise system security (e.g. perhaps it has to be enabled by deafult

Also, he mentioned that the Azure client has something similar -- MicrosoftAzureBuilder::with_use_azure_cli that we could use as a model

alamb commented 2 days ago

See also related ticket in pola-rs that @tustvold filed: https://github.com/pola-rs/polars/issues/18979

tustvold commented 1 day ago

TBC I view this very much as a hack around an API limitation in Polars, I would prefer we try to fix this there before resorting to this - https://github.com/pola-rs/polars/issues/18979#issuecomment-2381289706

edmondop commented 1 day ago

Can you explain why you think credential process support related to Polars? To me is a gap in AmazonS3Builder in object store

tustvold commented 1 day ago

Your original request concerned supporting a broader range of auth within the context of polars. Credential process support was then proposed as a way to workaround the inability to override the credential configuration within polars. By fixing this limitation of polars we not only provide a way for users to use credential process, via an AWS SDK that implements it, but also the full flexibility of all the other auth possibilities exposed by these SDKs.

I'd naturally prefer the solution that gives users the most flexibility and avoids needing to revisit this again when someone comes along requesting SSO or similar

alamb commented 3 hours ago

I'd naturally prefer the solution that gives users the most flexibility and avoids needing to revisit this again when someone comes along requesting SSO or similar

I would also like to avoid a similar conversation with users of systems other than polars.

I agree in an ideal world, perhaps polars would implement a user APIs to fully configure S3 auth via the object_store using the existing APIs.

Even if they did this, however, I think we will continue to have similar conversations with other downtream users

I view this credential process not as a hack but a general purpose configuration convention that works for any subsequent user (similarly to how object_store also supports the standard AWS configuration environment variables without any downstream crate configuration

tustvold commented 2 hours ago

Even if they did this, however, I think we will continue to have similar conversations with other downtream users

I think this proposal doesn't meaningfully help in this regard because:

So we'd end up with users continuing to come here asking about this, and we'd have to direct them to some object_store specific environment configuration to call out to some external process they have to setup.

Encouraging the downstreams to expose, or otherwise utilize the object_store credential provider API would avoid this entirely.