Closed dennishuo closed 2 months ago
@cgpoh if you have time, could you please verify whether SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION: true
satisfies your use case as an alternative to https://github.com/apache/polaris/pull/77 ?
@collado-mike you might be interested in validating whether my changes to dropTable(purge) make sense; as far as I can tell since TaskFileIOSupplier
is what ultimately provides purge credentials, the inline credential-creation was extraneous
@dennishuo , thanks! I will try to verify when it's merge
Hi @dennishuo, I just tested and the fix doesn't satisfy my use case. For my use case, the Spark job should not provide Azure credentials as environment variables for it to execute properly. I don't want our Azure credentials to be exposed through Spark Job and hence, I prefer the credential vending mode where Polaris Catalog is the only place that knows about Azure credentials and we can apply RBAC policy to Spark job via spark.sql.catalog.catalog_name.credential
@cgpoh Thanks for the extra info! To confirm, you still want Polaris to provide Spark with the ability to access Azure storage, but can't use SAS_TOKEN, right? Did you want Polaris to return something like a "shared key" to Spark instead? Were you also going to add this proposed new credential-vending approach to https://github.com/apache/polaris/pull/77 ?
@dennishuo , yes. My company disable the permission for us to use SAS_TOKEN and thus we need to fallback to use environment variables for ADLS. Sorry, what is the proposed new credential vending approach?
Description
The previous default initialization of a
catalogFileIO
that doesn't take into consideration any kind of subscoping is fundamentally a bug if it ever kicks in for real isolated-server scenarios where "application defaults" aren't appropriate for any storage interactions. Some unittest scenarios relied on this "default" initialization but this masked bugs where methods likeregisterTable
really do need to go through the full subscoping flow, but silently "succeeded" in test cases just because the application default credentials leaked into the test environment.Refactoring of FileIO initialization in
BasePolarisCatalog
now makes it more explicit as to when we expect to use defaults or when we require going through subscoping flows.Conveniently, this also provides a clean way to configure a server to skip all subscoping flows entirely and always allow application defaults and/or table-config inheritance to define the credentials used to interact with storage. (E.g. provides a way to achieve https://github.com/apache/polaris/pull/77/files just by setting SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION=true)
Overview:
refreshIOWithCredentials
catalogFileIO
by default; only set it inBasePolarisCatalogTest
where client/server behaviors are mixedregisterTable
to correctly use the subscoping flow to initialize a FileIOFixes https://github.com/apache/polaris/issues/156 and https://github.com/apache/polaris/pull/77
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Test Configuration:
Checklist:
Please delete options that are not relevant.