Azure / azure-sqldb-spark

This project provides a client library that allows Azure SQL DB or SQL Server to act as an input source or output sink for Spark jobs.
MIT License
76 stars 52 forks source link

Table is not responding in bulkCopyToSqlDB #59

Closed demoV closed 4 years ago

demoV commented 4 years ago

Hi,

We are processing ~400M records. when we are trying to bulk insert records the Spark job is failing with query time out. And also the table (Destination to insert) is not responding for some time.

our config is

{ "url": "<URL>", "databaseName": "<DB_NAME>", "dbTable": "<DB_TABLE>" "bulkCopyBatchSize": 100000, "bulkCopyTimeOut": 600, "bulkCopyTableLock": "true", "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver" } We have tried with more than 1000 partitions as well.

Please help to tackle this issue.

MironAtHome commented 4 years ago

Have you tried to explore what exactly causes timeout? Is it possible table was in use and query, as per above, could not secure possession of exclusive lock on underlying table? Regarding "the table (Destination to insert) is not responding for some time." that would probably be expected, since you are using table lock. If you need to load data and use table simultaneously I recommend removing option "bulkCopyTableLock": "true" and reducing size of batch. This will lengthen loading time, but will be more friendly to what you are trying to achieve. If your requirements more stringent, please consider partitioning table loaded from Spark.

ghost commented 4 years ago

Does your destination table have indexes? I had performance issue for large tables with indexes.

Droping indexes before bulk insert and re-creating indexes at the end helped me improve performance significantly. I was able to bulk insert 180 million records within 23 mins to a heap. Azure SQL elastic pool with general purpose 8 v cores and cluster with 8 work nodes of DS12v2

Link below would be useful: https://techcommunity.microsoft.com/t5/datacat/turbo-boost-data-loads-from-spark-using-sql-spark-connector/ba-p/305523#

arvindshmicrosoft commented 4 years ago

@demoV as your question is quite environment-specific it is difficult to give prescriptive guidance. The general tips I would mention are:

Please let me know if there are any other questions, else I would like to close this thread by July 3, 2020.

arvindshmicrosoft commented 4 years ago

Closing per previous comment and subsequent inactivity.