Open majetideepak opened 5 months ago
@majetideepak Deepak, would you describe some use cases that need this information? Adding Identity struct is not enough I assume. What are the next steps to provide meaningful functionality?
@mbasmanova Masha, the immediate need for this is in the Arrow Flight Connector implementation. The authors requested this in the Presto issue here https://github.com/prestodb/presto/issues/22849
From the issue description, the Identity struct should be sufficient for now since the requirement is for the users to be able to specify the user credentials via extraCredentials
.
@ashkrisk, @tdcmeehan, can you confirm? Thanks.
@mbasmanova without an Identity API, the only way for connectors to read authentication credentials is through the catalog configuration file. This means that all presto users have the same level of access to the data source, which is not desirable in our case.
Specifying extraCredentials
allows multiple users connecting to the same presto cluster to connect to the same data source, but access only the subset of data they are authorized to.
I think adding an "identity" member to the QueryCtx
with a "user" and "credentials" field is good enough to address this.
@majetideepak @ashkrisk Are you saying that this new piece of information will be used only in Flight Arrow connector that's being developed outside of Velox codebase? If that's the case, it might be difficult to maintain this piece of code in the library since noone understands how it is being used. Would it be possible to add some comments in the code to explain why this piece is needed to prevent folks from accidentally deleting this as "dead code"?
Also, there are usually some security rules about handling sensitive information like user name and credentials. In particular for multi-tenant systems. Wondering how are you thinking about following best practices here.
@mbasmanova I will add some details on the usage. I will include some notes on the security rules/implications, especially in a multi-tenant system as well. I need to look at how Presto implements this.
This feature would be needed in Parquet Reader column level decryption. Parquet files supports columns to encrypted with a key. We at Uber would be using the user identity to verify if the user has access to encrypted column via a KMS service and use that information decrypt the column.
Description
Presto clients can provide extra credentials to a connector. See https://prestodb.io/docs/current/develop/client-protocol.html#client-request-headers. A related Prestissimo issue that requires these is https://github.com/prestodb/presto/issues/22849
The Presto protocol sends this information as part of the
TaskUpdateRequest
. https://github.com/prestodb/presto/blob/49f6566012bc67d9c06e48e74e1d2d6b4029b6e1/presto-native-execution/presto_cpp/presto_protocol/presto_protocol.h#L3086In Presto Java, these details are persisted in an Identity object under the security SPI. https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/security/Identity.java
The proposal is to add a similar API in Velox with only the minimal
user
andcredentials
fields. These fields are generic (user is std::string, credentials is std::unordered_map) and other clients can also use them for similar needs. The Presto Java Identity class has other fields (principal, roles, etc.) that are used for other use cases such as Kerberos support. These will be added later when needed.