Azure / azure-kusto-spark

Apache Spark Connector for Azure Kusto
Apache License 2.0
77 stars 34 forks source link

KUSTO_MANAGED_IDENTITY_AUTH is not a member of com.microsoft.kusto.spark.common.KustoOptions and com.microsoft.kusto.spark.datasink.KustoSinkOptions #357

Closed aniarpu closed 5 months ago

aniarpu commented 8 months ago

I get an error while trying to authenticate to Kusto with a managed identity from spark.

These are the errors I see -

  1. value KUSTO_MANAGED_IDENTITY_CLIENT_ID is not a member of object com.microsoft.kusto.spark.datasink.KustoSinkOptions
  2. value KUSTO_MANAGED_IDENTITY_AUTH is not a member of object com.microsoft.kusto.spark.datasink.KustoSinkOptions

Trying to use the trait KustoOptions also results in the same error

  1. error: value KUSTO_MANAGED_IDENTITY_CLIENT_ID is not a member of com.microsoft.kusto.spark.common.KustoOptions
  2. value KUSTO_MANAGED_IDENTITY_AUTH is not a member of com.microsoft.kusto.spark.common.KustoOptions

Is it still possible to authenticate using managed identities?

ag-ramachandran commented 8 months ago

Hello @aniarpu , Please provide the ode snippet you are using. Is it Read or Write? We'll guide you through next steps

aniarpu commented 8 months ago

Hi @ag-ramachandran, I am trying to perform a write. Here is a code snippet I am running from a synapse notebook. I am using the following jar kusto-spark_3.0_2.12-5.0.4-jar-with-dependencies.jar

`import com.microsoft.kusto.spark.datasink.KustoSinkOptions import com.microsoft.kusto.spark.datasource import com.microsoft.kusto.spark.sql.extension.SparkExtension.DataFrameReaderExtension import com.microsoft.kusto.spark.utils.{KustoDataSourceUtils => KDSU} import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types.{StringType, LongType, StructField, StructType} import com.microsoft.kusto.spark.common.KustoOptions

val schema = StructType( List( StructField("BinaryName", StringType, true), StructField("Etime", LongType, true), StructField("Ftime", LongType, true), StructField("SQLizerPartitionIndex", LongType, true) ) ) var df = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)

df = df.union(Seq(("binary1", 100, 100, 1)).toDF)

df. write. format("com.microsoft.kusto.spark.datasource"). option(KustoSinkOptions.KUSTO_CLUSTER, "ossec"). option(KustoSinkOptions.KUSTO_DATABASE, "IsoPlat"). option(KustoSinkOptions.KUSTO_TABLE, "AIS2"). option(KustoSinkOptions.KUSTO_MANAGED_IDENTITY_AUTH, true.toString). option(KustoSinkOptions.KUSTO_MANAGED_IDENTITY_CLIENT_ID, ""). mode("Append"). save()`

ag-ramachandran commented 8 months ago

2 things @aniarpu

a) In Synapse, you should use a SystemManagedIdentity to create a linked service and then use that from the code as a linked service. Doubt if MI was tested this way in Synapse

b) The format is wrong it should be the following (Refer : Samples .format("com.microsoft.kusto.spark.synapse.datasource")

And, just confirmed from the git blame, the version used in Synapse is 3.1.16 , which does not support ManagedIdentity as a spark option (hence the compilation error)

Git log

aniarpu commented 8 months ago

Thaks for the quick reply!

a) For this option did you mean something like this? option("spark.synapse.linkedService","DataExplorer_OSSec_IsoPlat"). I did try this but this uses device authentication, and I'm trying to avoid that.

b) I did try this, but I think my synapse notebook does not have the package as I got this error. I ended up using the format I used to circumvent that. Error - java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.kusto.spark.synapse.datasource.

ag-ramachandran commented 8 months ago

Hi @aniarpu

In synapse, we can use a SystemManagedIdentity by creating a linked service (No UserManagedIdentity). As a sample you can refer (LinkedService)[https://learn.microsoft.com/en-us/azure/data-factory/concepts-linked-services?tabs=data-factory] and (SystemManagedIdentity)[https://learn.microsoft.com/en-us/azure/synapse-analytics/synapse-service-identity]. This name can then be used from the notebook

As for format : com.microsoft.kusto.spark.synapse.datasource. , do you have an extra dot in the end ?

aniarpu commented 8 months ago

Thanks for the links, my synapse notebook session is not running as a managed identity, maybe that has something to do with it?

No, the format doesn't have an extra dot in the end, I must have accidentally added a full stop by habit.

So, summarizing (please let me know if this is incorrect),

  1. While using a synapse notebook, the only format that can be used is com.microsoft.kusto.spark.synapse.datasource and nothing else?
  2. MI is not a supported option in synapse with com.microsoft.kusto.spark.datasource
  3. I have to use only system managed identities, and no user manged identities.
ag-ramachandran commented 8 months ago

Summary :

  1. Yes
  2. Yes
  3. Yes

Synapse need not run with MI, the linked service can use ManagedIdentity (independent of Synapse)