MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.29k stars 21.47k forks source link

mssparkutils.credentials.getSecret() doesn't work as expected when running notebook non-interactively #93812

Closed jasonhorner closed 2 years ago

jasonhorner commented 2 years ago

https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python#credentials-utilities

when trying to retrieve secrets from key vault it seems there is a bug in the spark utilities. while they work fine interactively running the notebook.

When the notebook is run via a data integration pipeline:

mssparkutils.credentials.getSecret('azure key vault name','secret name','linked service name')

or

mssparkutils.credentials.getSecret('azure key vault name','secret name')

the secret retrieval will fail with both of the above approaches.

the only way I was able to get this to work was to use the token library:

token_library.getSecret('key_vault_name', 'secret_name', 'key_vault_linked_service_name')

it would be great to add a note to this section confirming this is currently expected behavior and documenting the above workaround.

It would also be great to update the section: https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python#configure-access-to-azure-key-vault

to include info on configuring the RBAC permissions model. I believe the correct RBAC permission is Key Vault Administrator


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

himanshusinha-msft commented 2 years ago

Thanks for the feedback and bringing this to our notice . At this time we are reviewing the feedback and will update the document as appropriate .

himanshusinha-msft commented 2 years ago

@jasonhorner : As I understand that when you run the notebook from a pipeline its failing and otherwise works when you execute . How is the linked service configured , Does the linked service has access the the AKV ( i doubt ) . When you run a notebook , then notebooks take your credentials and passes that to AKV to check you have the access and if you have , you are able to execute the notebook . But when running the pipeline its different, managed identity should have access to the AKV .

image

Let me know if you any further queries .

himanshusinha-msft commented 2 years ago

We are closing this issue as we have not heard back from you , you can always re-open the issue if you think that appropriate .

jasonhorner commented 2 years ago

@himanshusinha-msft The linked service is configured correctly and has the necessary rights (see 3rd code sample). my point was the call using token library works when passing the linked service however the mssparkutils.credentials methods do not work (first two code samples). I believe there is likely a bug in mssparkutils.credentials and until that is fixed the token library call seems to be the only viable work around.

note this problem only occurs when the notebook is run through the synapse notebook activity. when run interactively all 3 methods work.

aidenpham commented 1 year ago

I managed to find a solution for mssparkutils

First, you need to add Azure Key Vault linked service to your Synapse. In your Synapse, go to Manage -> Linked services then follow the instruction here: https://learn.microsoft.com/en-us/azure/data-factory/store-credentials-in-key-vault

Then you get the name of the linked service, and put it in the getSecret() function. (In the above instruction link, the linked service name is AzureKeyVault1)

Full Python code:

from notebookutils import mssparkutils

mssparkutils.credentials.getSecret('key_vault_name', 'secret_name', 'key_vault_linked_service_name')