Write to Kusto in Synapse with option "sparkIngestionPropertiesJson" always failed in spark 3.3

xiaoshiyi123 commented 10 months ago

Describe the bug Hi Team We want to update the Spark pool from 3.2 to 3.3. But when we use "sparkIngestionPropertiesJson" to write to Kusto, the spark job will not stop or fail, running for a long time.

To Reproduce Steps to reproduce the behavior:

Create a spark pool with spark 3.3
Create a notebook and use this spark pool
Read a df and write to Kusto with option "sparkIngestionPropertiesJson"

Expected behavior Write to Kusto successfully.

Screenshots use sp

do not use sp can success

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

ag-ramachandran commented 10 months ago

Hi @xiaoshiyi123

a) There are a couple of challenges here, by using SparkIngestionPropertiesJson, we are using the FlushImmediatelt flag to true, which we do not recommend.

Here is how the internals work

DataFrame ----> Write to blob ------> Ingest this blob

To optimize for throughput in ingestion, the size of the blob is critical. Kusto is optimized for few large blobs , as opposed to many small blobs. Please use the right batching policy from Kusto and you can get rid of SparkIngestionProperties altogether

(Refer : https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/batchingpolicy)

b) You can try and use the Queued writeMode, IIRC right the version in Synapse had an issue that where the shards that were to be merged were queried incorrectly (so probably that could be a cause too) If you still want to use the FlushImmediately flag still (Not recommended, will result in no aggregation and many smaller ingestion, please use the queued write option

.option("writeMode","Queued")

Refer : https://github.com/Azure/azure-kusto-spark/blob/master/docs/KustoSink.md writeMode

xiaoshiyi123 commented 10 months ago

Thanks @ag-ramachandran .option("writeMode","Queued") solved my problem. Thanks for your kind answer and suggestions.

Azure / azure-kusto-spark

Write to Kusto in Synapse with option "sparkIngestionPropertiesJson" always failed in spark 3.3 #342