Azure / azure-sqldb-spark

This project provides a client library that allows Azure SQL DB or SQL Server to act as an input source or output sink for Spark jobs.
MIT License
76 stars 52 forks source link

Azure databricks scheduled job failure intermittently object sqldb is not a member of package com.microsoft.azure #81

Closed SubratPanigrahi closed 4 years ago

SubratPanigrahi commented 4 years ago

Hi All,

Our scheduled job in azure Databricks fails intermittently with the below error message. But it doesn't fail when we run manually. We have attached an Interactive Cluster to the notebook. Not sure why it fails intermittently.

error: object sqldb is not a member of package com.microsoft.azure import com.microsoft.azure.sqldb.spark.bulkcopy.BulkCopyMetadata

Any suggestion would be appreciated!!

rworbis commented 4 years ago

Hi, we have the same problem.

error: object sqldb is not a member of package com.microsoft.azure import com.microsoft.azure.sqldb.spark.connect._

When starting the cluster manually everything works - running the notebook manually as well as starting the job. When the cluster is started by the scheduler the error occurs in 90% of the cases.

Any ideas?

tkasu commented 4 years ago

Does this error occur if you are using a data engineering cluster instead of an interactive cluster?

I have had this same problem with interactive clusters, and it occurs because Databricks installs libraries asynchronously to interactive clusters after cluster start. Therefore the cluster may run that import statement before azure-sqldb-spark has been installed.

So I think the solution is either to use data engineering clusters (if slow start time is acceptable) or create some logic to wait for cluster installs before running the imports. It is e.g. possible to poll cluster library installation status from Databricks API.

rworbis commented 4 years ago

The cluster is the same, just once started manually, once by a scheduled job.

tkasu commented 4 years ago

@rworbis if you monitor your interactive clusters "Libraries" tab status indicators at the same time the scheduled job is starting, you will probably notice that the failure will occur before the installation of azure-sqldb-spark has been completed.

If that is the case, this is a Databricks issue.

arvindshmicrosoft commented 4 years ago

The suggestion from @tkasu are probably the best hint the OP can get on this context-specific issue. Closing the issue as it is unlikely that anyone from the community can provide any more prescriptive advice.