apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

[Feature Request] Provide user access to all tables except tableX #14076

Open piby180 opened 1 month ago

piby180 commented 1 month ago

We are currently using ZkBasicAuthAccessControlFactory to manage our users in Pinot.

With this, we can

  1. provide access to ALL tables
  2. provide access to specific tables. List of tables have to be provided explicitly here

We now have 100s of tables and we often get the requirement to provide access to all tables except tableX. With the current setup, the only way is to provide a list of all tables except tableX in POST /users rest call.

We would like to request to implement a new version of payload for

POST /users

{
    "username" : "user1"
    "password" : "test",
    "component" : "BROKER",
    "role" : "USER",
    "permissions" : ["READ"],
    "tables" : ["ALL"],
    "excludeTables" : ["tableX"]
}

Or another option could be the support of regular expressions

POST /users

{
    "username" : "user1"
    "password" : "test",
    "component" : "BROKER",
    "role" : "USER",
    "permissions" : ["READ"],
    "includeTables" : "vendorX*", // include all tables which starts with "vendorX"
    "excludeTables" : "vendorY*" // exclude all tables which starts with "vendorY"
}
engrravijain commented 1 month ago

Hey @yashmayya, I am new to this project, was looking out for issues that I can contribute to. Would it be ok if I start working on this issue if it's not a priority thing as I may take time to build this.

This looks like a valid use case to me as we faced similar problem while managing access for a large number of Big Query Tables.

cc: @mayankshriv

yashmayya commented 1 month ago

@engrravijain thanks for your interest in contributing to this project! I've assigned the issue to you.

engrravijain commented 1 month ago

Thanks Yash! Looking forward to it!

piby180 commented 1 month ago

Thanks @engrravijain I have thought about it overnight and I would really prefer the use of regular expressions if possible as it would give us more flexibility in setting authorization rules. Specifically the following pattern is also important to us "Give access to all tables whose names start with vendor_"

Regular expressions also support negative-assertions so the pattern "All tables except tablex" can also be written via regular expressions.

One proposal could be

POST /users

{
    "username" : "user1"
    "password" : "test",
    "component" : "BROKER",
    "role" : "USER",
    "permissions" : ["READ"],
    "tables" : ["ALL"] or "{regex}"
}

where tables could be a list or string OR a regular expression string. Also, while you are at this topic, it would also make sense to send proper query response for authentication or authorization failures.

Right now, we get the following error in 1.2.0 if there is an authentication or authorization issue and our developers are always confused what this error means.

DatabaseError: Query

{'sql': 'SELECT * FROM tablex LIMIT 10', 'queryOptions': 'useMultistageEngine=true'} timed out: Out of -1, only -1 responded, while needed was -1

I would love to see a clear response which says exactly whether it was authetication issue or authorization issue.

Let me know if you have any questions for me. Thanks!