Closed xiaoshiyi123 closed 10 months ago
Hi @xiaoshiyi123
a) There are a couple of challenges here, by using SparkIngestionPropertiesJson, we are using the FlushImmediatelt flag to true, which we do not recommend.
Here is how the internals work
DataFrame ----> Write to blob ------> Ingest this blob
To optimize for throughput in ingestion, the size of the blob is critical. Kusto is optimized for few large blobs , as opposed to many small blobs. Please use the right batching policy from Kusto and you can get rid of SparkIngestionProperties altogether
(Refer : https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/batchingpolicy)
b) You can try and use the Queued writeMode, IIRC right the version in Synapse had an issue that where the shards that were to be merged were queried incorrectly (so probably that could be a cause too) If you still want to use the FlushImmediately flag still (Not recommended, will result in no aggregation and many smaller ingestion, please use the queued write option
.option("writeMode","Queued")
Refer : https://github.com/Azure/azure-kusto-spark/blob/master/docs/KustoSink.md writeMode
Thanks @ag-ramachandran .option("writeMode","Queued") solved my problem. Thanks for your kind answer and suggestions.
Describe the bug Hi Team We want to update the Spark pool from 3.2 to 3.3. But when we use "sparkIngestionPropertiesJson" to write to Kusto, the spark job will not stop or fail, running for a long time.
To Reproduce Steps to reproduce the behavior:
Expected behavior Write to Kusto successfully.
Screenshots use sp
do not use sp can success
Desktop (please complete the following information):
Additional context Add any other context about the problem here.