MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.22k stars 21.37k forks source link

Spark 3.1 End Of Support: Impact on Data Flows? #119045

Closed VitalyMCT closed 4 months ago

VitalyMCT commented 8 months ago

The Synapse data flows currently use the Spark 3.1 runtime. There is no way to configure the Spark version there.

What would happen when the runtime reaches EOL next week?

Are there plans to migrate data flows to a newer Spark runtime? If so, when?


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

AjayBathini-MSFT commented 8 months ago

@VitalyMCT Thanks for your feedback! We will investigate and update as appropriate.

RamanathanChinnappan-MSFT commented 8 months ago

@VitalyMCT Based on the provided document, the Azure Synapse Runtime for Apache Spark 3.1 will reach its end of life on January 26, 2023. After that date, the runtime will not receive bug and feature fixes, but security fixes will be backported based on risk assessment.

Regarding the migration of data flows to a newer Spark runtime, I'm not sure about the specific plans of Microsoft. However, it is recommended to upgrade your Apache Spark 3.1 workloads to version 3.2 or 3.3 at your earliest convenience. You can review the official Spark 3.2 release notes and migration guidelines between Spark 3.1 and 3.2 to assess potential changes to your applications. For more details, you can read the Azure Synapse documentation on Apache Spark version support and Azure Synapse Runtime for Apache Spark 3.2.

VitalyMCT commented 8 months ago

@RamanathanChinnappan-MSFT

Thanks for your response.

To clarify, the question is specifically about the Spark runtime used for (formerly called "mapping") data flows. Not Spark Pools. Data flows have an internal, Microsoft-controlled Spark runtime. Customers have no control over its versioning and are unable to migrate to 3.2 or beyond.

An example from one of the data flow runs:

image

RamanathanChinnappan-MSFT commented 8 months ago

@VitalyMCT I've delegated this to @ekote, a content author, to review and share their valuable insights.

KrishnakumarRukmangathan commented 5 months ago

@VitalyMCT - Mapping Data Flows Spark runtime migration from 3.1 to 3.3 is ongoing at present. Migration is planned to be completed soon.

VitalyMCT commented 5 months ago

@KrishnakumarRukmangathan Thanks for the update.

Would it be possible to please quantify "soon" in terms of ETA? Is it this month, next month, etc.?

Also, would you have a private or public preview for this? If so, we would be happy to participate.

VitalyMCT commented 4 months ago

@ekote Following up - can this issue please be addressed?

Our customers are currently getting emails that include the following. This is very confusing. Their data flows are on 3.1 and they cannot migrate. What's the impact?

image

bandersmsft commented 4 months ago

Followed up with author.

ekote commented 4 months ago

@VitalyMCT - If the reason for using Azure Synapse Spark Pool 3.1 is DataFlows, then that is correct - no work on your side. The Azure Data Factory team is responsible for migrating customers from 3.1 to 3.3 and then to 3.4.

VitalyMCT commented 4 months ago

@ekote

Regarding

If the reason for using Azure Synapse Spark Pool 3.1 is DataFlows

That's part of the question. What's triggering those alerts? Is it the data flows? We don't have Spark 3.1 pools running deliberately. The only connection seems to be the data flows with their hardcoded version #. The emails are quite unclear.

We've also been awaiting Microsoft's reply to the questions from April 8th in this issue thread.

ekote commented 4 months ago

@VitalyMCT - Today, that thread landed in my inbox, and I replied immediately.

The question you're asking is targeting support team. So I would encourage you to create a support ticket.

Thank you.

WilliamDAssafMSFT commented 4 months ago

@VitalyMCT I agree with @ekote, we can't give any non-public info or other roadmap info here. These questions might have been best answered in an Azure support case. This method - GitHub Issues - is for documentation problems only and has no SLA. Docs really can't make future promises, so we're unlikely to document some of the answers you've asked, but they may be addressed future Azure blog posts or Azure updates. I do understand the frustration here, but this communication method isn't what you're looking for, as is evident by the turnaround time.

I recommend Microsoft Q & A (Azure Synapse Analytics), Azure Synapse Analytics feedback, Microsoft Support, or your support partners. These avenues donger product group attention and can receive official updates.

Regardless, thanks again for this feedback.

WilliamDAssafMSFT commented 4 months ago

please-close

VitalyMCT commented 2 weeks ago

For anyone following this thread, it seems the Data Flow runtime was migrated to 3.3 recently. This is based on pipeline output telemetry showing that Spark version.

In the absence of any communication it remains unclear whether this upgrade has been completed globally, in selected regions only, or with any other scope.

According to https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-33-runtime, this runtime provides supportability up to March 31st of next year.

There is definitely remains a communication gap here that needs to be closed here for customers to be able to rely on Data Flows in Synapse in production scenarios.