apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.44k stars 3.51k forks source link

Arrow Flight SQL server AuthN and AuthZ #44456

Open Susmit07 opened 2 days ago

Susmit07 commented 2 days ago

Describe the usage question you have. Please include as many useful details as possible.

Hi Team,

If things goes well, we will be using Arrow Flight SQL Server.

We will be publishing sdks for java and python for teams to consume in our organisation.

Flight SQL server is written in Python and we are using DuckDB query engine for query support on S3 parquet files.

sql_query = "SELECT * FROM read_parquet('s3a://bucket/parquets/flights-1m-new-*.parquet');"

The performance numbers are really good.

I want to know how to authenticate and authorize a user (best practices) accessing a parquet data for a bucket. Every teams will have dedicated S3 buckets.

Everytime initialising s3 client with new access and secret keys won't it be resource intensive?

Component(s)

FlightRPC

lidavidm commented 2 days ago

Isn't this question more about DuckDB? Flight isn't involved in the S3 access here.

lidavidm commented 2 days ago

If you just want authentication/authorization, you can implement a middleware to check a bearer token or similar.

Susmit07 commented 2 days ago

Yeah authN i am thinking to use a JWT based authN, you are correct on accessing the s3 buckets by duck db can we initialise s3 client with new access and secret keys every-time do you see any issues here?

lidavidm commented 2 days ago

I can't answer for DuckDB - you should ask the DuckDB community.

I don't believe we have an example of JWT specifically but you can implement that yourself with middleware as mentioned.

Susmit07 commented 2 days ago

sure will do.. thank @lidavidm