Closed VitalyMCT closed 4 months ago
@VitalyMCT Thanks for your feedback! We will investigate and update as appropriate.
@VitalyMCT Based on the provided document, the Azure Synapse Runtime for Apache Spark 3.1 will reach its end of life on January 26, 2023. After that date, the runtime will not receive bug and feature fixes, but security fixes will be backported based on risk assessment.
Regarding the migration of data flows to a newer Spark runtime, I'm not sure about the specific plans of Microsoft. However, it is recommended to upgrade your Apache Spark 3.1 workloads to version 3.2 or 3.3 at your earliest convenience. You can review the official Spark 3.2 release notes and migration guidelines between Spark 3.1 and 3.2 to assess potential changes to your applications. For more details, you can read the Azure Synapse documentation on Apache Spark version support and Azure Synapse Runtime for Apache Spark 3.2.
@RamanathanChinnappan-MSFT
Thanks for your response.
To clarify, the question is specifically about the Spark runtime used for (formerly called "mapping") data flows. Not Spark Pools. Data flows have an internal, Microsoft-controlled Spark runtime. Customers have no control over its versioning and are unable to migrate to 3.2 or beyond.
An example from one of the data flow runs:
@VitalyMCT I've delegated this to @ekote, a content author, to review and share their valuable insights.
@VitalyMCT - Mapping Data Flows Spark runtime migration from 3.1 to 3.3 is ongoing at present. Migration is planned to be completed soon.
@KrishnakumarRukmangathan Thanks for the update.
Would it be possible to please quantify "soon" in terms of ETA? Is it this month, next month, etc.?
Also, would you have a private or public preview for this? If so, we would be happy to participate.
@ekote Following up - can this issue please be addressed?
Our customers are currently getting emails that include the following. This is very confusing. Their data flows are on 3.1 and they cannot migrate. What's the impact?
Followed up with author.
@VitalyMCT - If the reason for using Azure Synapse Spark Pool 3.1 is DataFlows, then that is correct - no work on your side. The Azure Data Factory team is responsible for migrating customers from 3.1 to 3.3 and then to 3.4.
@ekote
Regarding
If the reason for using Azure Synapse Spark Pool 3.1 is DataFlows
That's part of the question. What's triggering those alerts? Is it the data flows? We don't have Spark 3.1 pools running deliberately. The only connection seems to be the data flows with their hardcoded version #. The emails are quite unclear.
We've also been awaiting Microsoft's reply to the questions from April 8th in this issue thread.
@VitalyMCT - Today, that thread landed in my inbox, and I replied immediately.
The question you're asking is targeting support team. So I would encourage you to create a support ticket.
Thank you.
@VitalyMCT I agree with @ekote, we can't give any non-public info or other roadmap info here. These questions might have been best answered in an Azure support case. This method - GitHub Issues - is for documentation problems only and has no SLA. Docs really can't make future promises, so we're unlikely to document some of the answers you've asked, but they may be addressed future Azure blog posts or Azure updates. I do understand the frustration here, but this communication method isn't what you're looking for, as is evident by the turnaround time.
I recommend Microsoft Q & A (Azure Synapse Analytics), Azure Synapse Analytics feedback, Microsoft Support, or your support partners. These avenues donger product group attention and can receive official updates.
Regardless, thanks again for this feedback.
For anyone following this thread, it seems the Data Flow runtime was migrated to 3.3 recently. This is based on pipeline output telemetry showing that Spark version.
In the absence of any communication it remains unclear whether this upgrade has been completed globally, in selected regions only, or with any other scope.
According to https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-33-runtime, this runtime provides supportability up to March 31st of next year.
There is definitely remains a communication gap here that needs to be closed here for customers to be able to rely on Data Flows in Synapse in production scenarios.
The Synapse data flows currently use the Spark 3.1 runtime. There is no way to configure the Spark version there.
What would happen when the runtime reaches EOL next week?
Are there plans to migrate data flows to a newer Spark runtime? If so, when?
Document Details
⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.