Closed demoV closed 4 years ago
Have you tried to explore what exactly causes timeout? Is it possible table was in use and query, as per above, could not secure possession of exclusive lock on underlying table? Regarding "the table (Destination to insert) is not responding for some time." that would probably be expected, since you are using table lock. If you need to load data and use table simultaneously I recommend removing option "bulkCopyTableLock": "true" and reducing size of batch. This will lengthen loading time, but will be more friendly to what you are trying to achieve. If your requirements more stringent, please consider partitioning table loaded from Spark.
Does your destination table have indexes? I had performance issue for large tables with indexes.
Droping indexes before bulk insert and re-creating indexes at the end helped me improve performance significantly. I was able to bulk insert 180 million records within 23 mins to a heap. Azure SQL elastic pool with general purpose 8 v cores and cluster with 8 work nodes of DS12v2
Link below would be useful: https://techcommunity.microsoft.com/t5/datacat/turbo-boost-data-loads-from-spark-using-sql-spark-connector/ba-p/305523#
@demoV as your question is quite environment-specific it is difficult to give prescriptive guidance. The general tips I would mention are:
Please let me know if there are any other questions, else I would like to close this thread by July 3, 2020.
Closing per previous comment and subsequent inactivity.
Hi,
We are processing ~400M records. when we are trying to bulk insert records the Spark job is failing with
query time out
. And also the table (Destination to insert) is not responding for some time.our config is
{ "url": "<URL>", "databaseName": "<DB_NAME>", "dbTable": "<DB_TABLE>" "bulkCopyBatchSize": 100000, "bulkCopyTimeOut": 600, "bulkCopyTableLock": "true", "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver" }
We have tried with more than 1000 partitions as well.Please help to tackle this issue.