Azure / azure-kusto-spark

Apache Spark Connector for Azure Kusto
Apache License 2.0
77 stars 35 forks source link

Error using Service Principal with Certificate #309

Closed vsaroopchand closed 1 year ago

vsaroopchand commented 1 year ago

Describe the bug Currently facing an issue writing to Kusto using Service Principal & Certificate. As you can see in the Linked Service setup below, the connection is successful using Test Connection. However, when writing to the Kusto cluster using AuthType LS it fails with the below error. What am I missing?

Setup:

image

Script: df.write \ .format("com.microsoft.kusto.spark.synapse.datasource") \ .option("spark.synapse.linkedService", "MyKustoLinkService") \ .option("kustoDatabase", "MyDb") \ .option("kustoTable", "MyTestTable") \ .option("authType", "LS") \ .mode("Append") \ .save()

Error: Py4JJavaError: An error occurred while calling o3933.save. : com.microsoft.azure.synapse.tokenlibrary.TokenLibrary$NonRetryableStatusException$1: POST failed with 'Bad Request' (400) and message: {"result":"DependencyError","errorId":"BadRequest","errorMessage":"[Code=, Target=, Message=]. TraceId : 5600b743-ee2b-4435-8309-9fbffbdd98e0 | client-request-id : 40bae8f4-12ed-4ac6-ae19-0e55867036fa. Error Component : LSR"} at com.microsoft.azure.synapse.tokenlibrary.TokenLibrary.$anonfun$invokeTokenService$7(TokenLibrary.scala:470) at com.twitter.util.Future.$anonfun$flatMap$1(Future.scala:1808) at com.twitter.util.Promise$FutureTransformer.liftedTree1$1(Promise.scala:240) at com.twitter.util.Promise$FutureTransformer.k(Promise.scala:240) at com.twitter.util.Promise$Transformer.apply(Promise.scala:215) at com.twitter.util.Promise$WaitQueue.com$twitter$util$Promise$WaitQueue$$run(Promise.scala:91)

To Reproduce Setup a link service to a Kusto cluster using Service Principal w/ Certificate from a Key Vault.

Expected behavior Dataframe is successfully written to the Kusto Table.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

ag-ramachandran commented 1 year ago

Hello @vsaroopchand

Thanks for confirming with a screengrab here. Let me set the context , workarounds and next steps here ( Noam mentioned this earlier, just adding the full context around this)

Context:

The Synapse spark connector is a wrapper that provides abstractions over Kusto spark connector and authentication using the Synapse libraries. Internal to the Synapse auth repos are a set of libraries called TokenLibrary (https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary?pivots=programming-language-python) . To make it easier for users, the Synapse spark connector calls this TokenLibrary API , gets the token and then passes it when calling Kusto.

image

The TokenLibrary auth part is broken unfortunately for ManagedIdentity. See illustration below where it fails with a BadRequest for MSI and Certificate based auth and works for an AAD app-based Auth. This is the exception you see bubble up into the notebook execution

image (1)

Workaround:

Next steps:

An ICM perhaps with the Synapse team will help in looking at taking their help and fixing the issue with the TokenLibrary API ..

vsaroopchand commented 1 year ago

Thank you @ag-ramachandran - can you provide more context on the first workaround "Easiest is to use AAD app based auth in the LinkedService, this will work."

ag-ramachandran commented 1 year ago

Hello @vsaroopchand , You will have to create an APP Id / App Secret (Using App registration) and use that in the Synapse Wizard (The UI will request for AppId,AppKey and Tenant).

vsaroopchand commented 1 year ago

Thank you. Unfortunately I cannot use Client/Secret. I'll explore the MSAL option.