Closed alexszym closed 5 months ago
Thanks for the feedback! We have assigned the issue to the content author to further review this and provide an update as appropriate.
@alexszym Apologies for the long delay, the information you are looking for is available in the following two docs. Let us know if you are looking for something different.
Shows how to create a pipeline with an Apache Spark job definition: https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-transform-data-using-spark-job-definition
Shows how to add an Apache Spark job definition into pipeline: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-job-definitions#add-an-apache-spark-job-definition-into-pipeline
Hi @SnehaGunda, this is a good start but it's not as detailed as the notebooks documentation. I think these would be really useful sections to enhance the spark job tutorial with:
Hello alexszym, did you find documentation on when to use Spark Jobs? .I'm trying to figure out whether I should use Spark Jobs for production rather than Notebooks.
Hi @danpeco, I think the community opinion is split on this so it may be difficult to find documentation when to use each. On our side we went with Spark jobs and don't use notebooks in any production setup, only for initial testing. We source control the spark jobs separately and treat them like any other piece of code. We've found it more difficult to implement good code practices in notebooks themselves. Hope that helps.
@alexszym It is quite helpful; thanks a lot.
Thanks for your dedication to our documentation. Unfortunately, at this time we have been unable to review your issue in a timely manner and we sincerely apologize for the delayed response. We are closing this issue for now, but if you feel that it's still a concern, please respond and let us know. If you determine another possible update to our documentation, please don't hesitate to reach out again. #please-close
The following article has great documentation for how to use Notebooks in Synapse. As the article itself suggest those are great for validating ideas and get something working quickly, but not so much for defining usage in production.
It would be great to get an equivalent article for how to get Synapse spark jobs working together with the pipelines.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.