apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.64k stars 3.56k forks source link

Arrow Flight SQL server AuthN and AuthZ #44456

Open Susmit07 opened 1 month ago

Susmit07 commented 1 month ago

Describe the usage question you have. Please include as many useful details as possible.

Hi Team,

If things goes well, we will be using Arrow Flight SQL Server.

We will be publishing sdks for java and python for teams to consume in our organisation.

Flight SQL server is written in Python and we are using DuckDB query engine for query support on S3 parquet files.

sql_query = "SELECT * FROM read_parquet('s3a://bucket/parquets/flights-1m-new-*.parquet');"

The performance numbers are really good.

I want to know how to authenticate and authorize a user (best practices) accessing a parquet data for a bucket. Every teams will have dedicated S3 buckets.

Everytime initialising s3 client with new access and secret keys won't it be resource intensive?

Component(s)

FlightRPC

lidavidm commented 1 month ago

Isn't this question more about DuckDB? Flight isn't involved in the S3 access here.

lidavidm commented 1 month ago

If you just want authentication/authorization, you can implement a middleware to check a bearer token or similar.

Susmit07 commented 1 month ago

Yeah authN i am thinking to use a JWT based authN, you are correct on accessing the s3 buckets by duck db can we initialise s3 client with new access and secret keys every-time do you see any issues here?

lidavidm commented 1 month ago

I can't answer for DuckDB - you should ask the DuckDB community.

I don't believe we have an example of JWT specifically but you can implement that yourself with middleware as mentioned.

Susmit07 commented 1 month ago

sure will do.. thank @lidavidm