facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.
https://velox-lib.io/
Apache License 2.0
3.53k stars 1.16k forks source link

Add Identity API to QueryCtx and ConnectorQueryCtx #10107

Open majetideepak opened 5 months ago

majetideepak commented 5 months ago

Description

Presto clients can provide extra credentials to a connector. See https://prestodb.io/docs/current/develop/client-protocol.html#client-request-headers. A related Prestissimo issue that requires these is https://github.com/prestodb/presto/issues/22849

The Presto protocol sends this information as part of the TaskUpdateRequest. https://github.com/prestodb/presto/blob/49f6566012bc67d9c06e48e74e1d2d6b4029b6e1/presto-native-execution/presto_cpp/presto_protocol/presto_protocol.h#L3086

In Presto Java, these details are persisted in an Identity object under the security SPI. https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/security/Identity.java

The proposal is to add a similar API in Velox with only the minimal user and credentials fields. These fields are generic (user is std::string, credentials is std::unordered_map) and other clients can also use them for similar needs. The Presto Java Identity class has other fields (principal, roles, etc.) that are used for other use cases such as Kerberos support. These will be added later when needed.

mbasmanova commented 5 months ago

@majetideepak Deepak, would you describe some use cases that need this information? Adding Identity struct is not enough I assume. What are the next steps to provide meaningful functionality?

majetideepak commented 5 months ago

@mbasmanova Masha, the immediate need for this is in the Arrow Flight Connector implementation. The authors requested this in the Presto issue here https://github.com/prestodb/presto/issues/22849 From the issue description, the Identity struct should be sufficient for now since the requirement is for the users to be able to specify the user credentials via extraCredentials.

@ashkrisk, @tdcmeehan, can you confirm? Thanks.

ashkrisk commented 5 months ago

@mbasmanova without an Identity API, the only way for connectors to read authentication credentials is through the catalog configuration file. This means that all presto users have the same level of access to the data source, which is not desirable in our case.

Specifying extraCredentials allows multiple users connecting to the same presto cluster to connect to the same data source, but access only the subset of data they are authorized to.

I think adding an "identity" member to the QueryCtx with a "user" and "credentials" field is good enough to address this.

mbasmanova commented 5 months ago

@majetideepak @ashkrisk Are you saying that this new piece of information will be used only in Flight Arrow connector that's being developed outside of Velox codebase? If that's the case, it might be difficult to maintain this piece of code in the library since noone understands how it is being used. Would it be possible to add some comments in the code to explain why this piece is needed to prevent folks from accidentally deleting this as "dead code"?

Also, there are usually some security rules about handling sensitive information like user name and credentials. In particular for multi-tenant systems. Wondering how are you thinking about following best practices here.

majetideepak commented 5 months ago

@mbasmanova I will add some details on the usage. I will include some notes on the security rules/implications, especially in a multi-tenant system as well. I need to look at how Presto implements this.

hitarth commented 2 months ago

This feature would be needed in Parquet Reader column level decryption. Parquet files supports columns to encrypted with a key. We at Uber would be using the user identity to verify if the user has access to encrypted column via a KMS service and use that information decrypt the column.