Azure / azure-sqldb-spark

This project provides a client library that allows Azure SQL DB or SQL Server to act as an input source or output sink for Spark jobs.
MIT License
75 stars 52 forks source link

Error "SQLServerException: The connection is closed" on writing Spark Dataframe to Azure SQL #57

Closed prapanw closed 4 years ago

prapanw commented 4 years ago

Hi, I am using [com.microsoft.azure:azure-sqldb-spark:1.0.2] to write a Spark Dataframe (50K+ rows, 6 columns) to my Azure SQL database.

I am using following method: dataDF.write.mode(SaveMode.Append).sqlDB(config) with query Timeout set to a high value (6000s).

Any ideas of why it might be failing? Below is the stack trace.

Exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 242 in stage 84850.0 failed 4 times, most recent failure: Lost task 242.3 in stage 84850.0 (TID 12743887, 10.139.64.24, executor 344): com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:227) at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:796) at com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(SQLServerConnection.java:2698) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:713) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:839) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:839) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:987) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:987) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2321) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2321) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140) at org.apache.spark.scheduler.Task.run(Task.scala:113) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:533) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:539) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

arvindshmicrosoft commented 4 years ago

I'm sorry, this kind of issue is typically very context-specific and needs to be debugged and isolated first. I'm not sure if it is specific to the connector. Also, note that this connector itself is not being actively maintained. I recommend you switch to using the newer one.

p.s. feel free to reach out internally at MSFT if you'd like to troubleshoot this together.