Azure / azure-kusto-spark

Apache Spark Connector for Azure Kusto
Apache License 2.0
77 stars 35 forks source link

Cannot write to ADX from Azure Databricks using Kusto connector for pyspark "com.microsoft.kusto.spark.datasource" #343

Closed pk2k14 closed 1 year ago

pk2k14 commented 1 year ago

I have been trying to write a dataframe in Azure Databricks to an ADX table using the kusto connector for pyspark. The command runs for 8-10 minutes and then throws the error "org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 30.0 failed 4 times, most recent failure: Lost task 0.3 in stage 30.0 (TID 1556) (100.127.243.6 executor 0): java.io.IOException:"

Caused by: com.microsoft.azure.storage.StorageException: at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:220) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1254) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:1226) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:469) ... 9 more Caused by: java.net.UnknownHostException: 840trldkcfxsoccteu2npe01.blob.core.windows.net at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:613) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:293) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.protocol.https.HttpsClient.(HttpsClient.java:264) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1167) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1061) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1347) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1322) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:264) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:100)

I have the following libraries installed on my cluster as well.

image

In the same notebook, I am able to read the table data from the same ADX database using the same connector.

ag-ramachandran commented 1 year ago

Hello @pk2k14

What is your networking strategy? Are you running Kusto in a private network ? (or) Azure Databricks in a seperate Subnet

Regards Ram

pk2k14 commented 1 year ago

Hi @ag-ramachandran,

Yes, Kusto is running in a private network, but we have all the network peerings configured between ADX and ADB.

ag-ramachandran commented 1 year ago

Hi @pk2k14

The connector uses internal storage accounts of Kusto (managed by Kusto) to be used during ingestion.

To find the storage accounts out connect to the Kusto ingest endpoint on Kusto explorer

image

Execute : .show ingestion resources

You will get the below set of storage accounts being used

image

Whitelist traffic to these storage accounts , without these ingestion will not work.

Query with ForceSingleMode will work, if you use the readMode as ForceDistributedMode, that will fail as well (as it accesses storage that is blocked)

pk2k14 commented 1 year ago

Assigning contributor access on the ADX should enable it to write to underlying storage as well, correct?

ag-ramachandran commented 1 year ago

@pk2k14 the question is not permissions, it is the route to that host (network).

I think there is a seperate thread you are following up with MS in a seperate channel as well ? Would you like to continue there ? ramacg is my handle at MS

pk2k14 commented 1 year ago

Yeah sure, we can discuss it over there.