MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.28k stars 21.46k forks source link

Is the Azure Synapse Spark SQL Connector available in Maven? #79802

Closed cw18-coder closed 5 months ago

cw18-coder commented 3 years ago

Hi,

I have created a Spark on Azure Synapse project in Intellij Idea. From that project, I would like to read external tables in a dedicated SQL pool into a Spark Dataframe and continue with further downstream processing.

Please could you help me with how to update the POM file to include the following 2 dependencies mentioned in the article:

  1. com.microsoft.spark.sqlanalytics.utils.Constants
  2. org.apache.spark.sql.SqlAnalyticsConnector._

I can't find either of these in Maven nor are any links or instructions provided in the article.

I prefer to develop code in an IDE rather than the Synapse Workspace Notebooks.

Regards, Clarence


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

CHEEKATLAPRADEEP-MSFT commented 3 years ago

@cw18-coder Thanks for the question! We are investigating and will update you shortly.

CHEEKATLAPRADEEP-MSFT commented 3 years ago

@cw18-coder Unfortunately, the above libraries are not available externally.

Apache Spark in Azure Synapse Analytics has a full set of libraries for common data engineering, data preparation, machine learning, and data visualization tasks. The full libraries list can be found at Apache Spark version support..

These libraries only available in the Azure Synapse Runtimes as shown below:

Azure Synapse Runtime for Apache Spark 2.4: sqlanalyticsconnector-1.0.9.2.6.99.201-34744923.jar Azure Synapse Runtime for Apache Spark 3.0: sqlanalyticsconnector-1.1.jar

If you want to use this libraries you need to first connect to your spark pools from IntelliJ tool and the run the scala commands.

For more information, refer to Tutorial: Create an Apache Spark application with IntelliJ using a Synapse workspace.

Hope this helps.

cw18-coder commented 3 years ago

Hi @CHEEKATLAPRADEEP-MSFT thanks for providing me with an update.

I am familiar with the Azure Toolkit for IntelliJ and use it extensively. Does your response mean, I can only access these libraries from the interactive console within IntelliJ after connecting to the Synapse Spark pool?

The problem is whether I connect to the Spark Pool to submit a job in batch mode or in interactive mode, an artifact in this case a JAR, needs to be compiled and sent across. This cannot happen because if I use the import statements in a Scala script the project will not build since the dependencies will not be recognized.

It's the classic caught between a rock and a hard place problem.

Is there a multi-step process that I can use to achieve what I'm after, from what I have surmised the connector is wrapper multiple steps to achieve the desired result.

If you would like to chat over Teams my alias is clwong.

Regards, Clarence

CHEEKATLAPRADEEP-MSFT commented 3 years ago

@cw18-coder We are reaching out to the internal team to get more help on this, I will update you once we hear back from them.

CHEEKATLAPRADEEP-MSFT commented 3 years ago

@euangms Could you please look into the above ask?

julieMSFT commented 3 years ago

assign:mlee3gsd

sudo-pradip commented 2 years ago

Any updates ?

SnehaGunda commented 2 years ago

reassign:mlee3gsd

manoharant commented 2 years ago

any update on this guys?

rjof commented 1 year ago

any update on this?

bandersmsft commented 5 months ago

Thanks for your dedication to our documentation. Unfortunately, at this time we have been unable to review your issue in a timely manner and we sincerely apologize for the delayed response. We are closing this issue for now, but if you feel that it's still a concern, please respond and let us know. If you determine another possible update to our documentation, please don't hesitate to reach out again. #please-close