How to use Synapse spark jobs

MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure

https://docs.microsoft.com/azure

Creative Commons Attribution 4.0 International

10.28k stars 21.47k forks source link

How to use Synapse spark jobs #91091

Closed alexszym closed 5 months ago

alexszym commented 2 years ago

The following article has great documentation for how to use Notebooks in Synapse. As the article itself suggest those are great for validating ideas and get something working quickly, but not so much for defining usage in production.

It would be great to get an equivalent article for how to get Synapse spark jobs working together with the pipelines.

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: e188fe7f-8cc8-2f0a-2ff5-c0eca25bcdce
Version Independent ID: a8ec994c-1eaa-6c98-6916-bcf4ef401757
Content: How to use Synapse notebooks - Azure Synapse Analytics
Content Source: articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md
Service: synapse-analytics
Sub-service: spark
GitHub Login: @ruixinxu
Microsoft Alias: ruxu

KranthiPakala-MSFT commented 2 years ago

Thanks for the feedback! We have assigned the issue to the content author to further review this and provide an update as appropriate.

SnehaGunda commented 2 years ago

@alexszym Apologies for the long delay, the information you are looking for is available in the following two docs. Let us know if you are looking for something different.

Shows how to create a pipeline with an Apache Spark job definition: https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-transform-data-using-spark-job-definition

Shows how to add an Apache Spark job definition into pipeline: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-job-definitions#add-an-apache-spark-job-definition-into-pipeline

alexszym commented 2 years ago

Hi @SnehaGunda, this is a good start but it's not as detailed as the notebooks documentation. I think these would be really useful sections to enhance the spark job tutorial with:

how to add logging to a spark job
how to read in secrets
configuring spark context

danpeco commented 1 year ago

Hello alexszym, did you find documentation on when to use Spark Jobs? .I'm trying to figure out whether I should use Spark Jobs for production rather than Notebooks.

alexszym commented 1 year ago

Hi @danpeco, I think the community opinion is split on this so it may be difficult to find documentation when to use each. On our side we went with Spark jobs and don't use notebooks in any production setup, only for initial testing. We source control the spark jobs separately and treat them like any other piece of code. We've found it more difficult to implement good code practices in notebooks themselves. Hope that helps.

danpeco commented 1 year ago

@alexszym It is quite helpful; thanks a lot.

bandersmsft commented 5 months ago

Thanks for your dedication to our documentation. Unfortunately, at this time we have been unable to review your issue in a timely manner and we sincerely apologize for the delayed response. We are closing this issue for now, but if you feel that it's still a concern, please respond and let us know. If you determine another possible update to our documentation, please don't hesitate to reach out again. #please-close