Azure / azure-kusto-spark

Apache Spark Connector for Azure Kusto
Apache License 2.0
77 stars 34 forks source link

Unable to Authenticate Using Managed Identity #373

Open giantonius opened 5 months ago

giantonius commented 5 months ago

Describe the bug Tried to follow the example in https://github.com/Azure/azure-kusto-spark/blob/master/docs/Authentication.md#managed-identity-authentication to authenticate using managed identity. Experienced multiple issues when trying to follow example. I am running the code in Azure Databricks environment and have downloaded the package (com.microsoft.azure.kusto:kusto-spark_3.0_2.12:5.0.6).

To Reproduce Steps to reproduce the behavior:

NameError: name 'KustoSinkOptions' is not defined when running the below code snippet:

df.write.format("com.microsoft.kusto.spark.datasource") \ .option(KustoSinkOptions.KUSTO_CLUSTER, "baseplatform.westus") \ .option(KustoSinkOptions.KUSTO_DATABASE, "WHEA") \ .option(KustoSinkOptions.KUSTO_TABLE, "TestManagedId") \ .option(KustoSinkOptions.KUSTO_MANAGED_IDENTITY_AUTH, true.toString) \ .option(KustoSinkOptions.KUSTO_MANAGED_CLIENT_ID, "xxxx") \ .mode(SaveMode.Append) \ .save()

java.security.InvalidParameterException: KUSTO_DATABASE parameter is missing. Must provide a destination database name when running the below code snippets:

df.write.format("com.microsoft.kusto.spark.datasource") \ .option("KustoSinkOptions.KUSTO_CLUSTER", "baseplatform.westus") \ .option("KustoSinkOptions.KUSTO_DATABASE", "WHEA") \ .option("KustoSinkOptions.KUST_TABLE", "TestManagedId") \ .option("KustoSinkOptions.KUSTO_MANAGED_IDENTITY_AUTH", True) \ .option(KustoSinkOptions.KUSTO_MANAGED_CLIENT_ID, "xxxx") \ .mode("Append") \ .save()

df.write.format("com.microsoft.kusto.spark.datasource") \ .option("KUSTO_CLUSTER", "baseplatform.westus") \ .option("KUSTO_DATABASE", "WHEA") \ .option("KUSTO_TABLE", "TestManagedId") \ .option("KUSTO_MANAGED_IDENTITY_AUTH", True) \ .option("KUSTO_MANAGED_CLIENT_ID", "xxxx") \ .mode("Append") \ .save()

ag-ramachandran commented 4 months ago

Hi @giantonius , While we'll check the code options for any bugs , you may want to check if ADB supports propagation of ManagedIdentity.

As for ADB, I will check how we can test with : https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/azure-mi and see if there something that needs fixing

ag-ramachandran commented 4 months ago

Hello @giantonius

Sorry, did not look at it more closely. Both the code snippets have minor mistakes

import com.microsoft.kusto.spark.datasink.KustoSinkOptions // you have to import manually

df.write.format("com.microsoft.kusto.spark.datasource")
.option(KustoSinkOptions.KUSTO_CLUSTER, "baseplatform.westus")
.option(KustoSinkOptions.KUSTO_DATABASE, "WHEA")
.option(KustoSinkOptions.KUSTO_TABLE, "TestManagedId")
.option(KustoSinkOptions.KUSTO_MANAGED_IDENTITY_AUTH, true.toString)
.option(KustoSinkOptions.KUSTO_MANAGED_CLIENT_ID, "xxxx")
.mode(SaveMode.Append)
.save()

This is wrong, as the constants are enclosed in Quotes

df.write.format("com.microsoft.kusto.spark.datasource")
.option("KustoSinkOptions.KUSTO_CLUSTER", "baseplatform.westus")
.option("KustoSinkOptions.KUSTO_DATABASE", "WHEA")
.option("KustoSinkOptions.KUST_TABLE", "TestManagedId")
.option("KustoSinkOptions.KUSTO_MANAGED_IDENTITY_AUTH", True)
.option(KustoSinkOptions.KUSTO_MANAGED_CLIENT_ID, "xxxx")
.mode("Append")
.save()

If you want to use literals as in option-3 , then these options have to be changed. These are from KustoOptions/KustoSinkOptions classes in this repo

df.write.format("com.microsoft.kusto.spark.datasource")
.option("kustoCluster", "baseplatform.westus")
.option("kustoDatabase", "WHEA")
.option("kustoTable", "TestManagedId")
.option("managedIdentityAuth", True)
.option("managedIdentityClientId", "xxxx")
.mode("Append")
.save()

Please use (1) or (3) , it should go through. Note that your ADB has to support ManagedIdentity, that is outside the scope of this connector

giantonius commented 4 months ago

Hi @ag-ramachandran , I tried the options 1 and 3, and got these errors:

Option 1 error: ModuleNotFoundError: No module named 'com.microsoft'

import com.microsoft.kusto.spark.datasink.KustoSinkOptions

df.write.format("com.microsoft.kusto.spark.datasource") \
    .option(KustoSinkOptions.KUSTO_CLUSTER, "baseplatform.westus") \
    .option(KustoSinkOptions.KUSTO_DATABASE, "WHEA") \
    .option(KustoSinkOptions.KUSTO_TABLE, "TestManagedId") \
    .option(KustoSinkOptions.KUSTO_MANAGED_IDENTITY_AUTH, true.toString) \
    .option(KustoSinkOptions.KUSTO_MANAGED_CLIENT_ID, "xxxx") \
    .mode(SaveMode.Append) \
    .save()

I have installed com.microsoft.azure.kusto:kusto-spark_3.0_2.12:5.0.6 on my Spark cluster on Azure Databricks

Option 3 error: IllegalArgumentException: scopes is null or empty

df.write.format("com.microsoft.kusto.spark.datasource") \
    .option("kustoCluster", "baseplatform.westus") \
    .option("kustoDatabase", "WHEA") \
    .option("kustoTable", "TestManagedId") \
    .option("managedIdentityAuth", True) \
    .option("managedIdentityClientId", "xxxx") \
    .mode("Append") \
    .save()

Additionally, can you clarify "your ADB has to support ManagedIdentity, that is outside the scope of this connector"? Do you have additional resources to set up managed identity support in ADB?