MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.21k stars 21.37k forks source link

Anaconda versioning in Synapse Spark pools #103298

Closed VitalyMCT closed 4 months ago

VitalyMCT commented 1 year ago

Can some details please be added around this important aspect?

As an example, Spark 3.3 (preview) pools currently include the pandas 1.2.3 library:

image

According to https://pypi.org/project/pandas/1.2.3/#history that version was released on March 2, 2021.

It's unclear to me when 1.2.3 was brought over into Anaconda, as I cannot find references to 1.2.3 on Anaconda's pages. However, it was probably sometime on or before Nov 17, 2021: https://docs.continuum.io/anaconda/reference/release-notes. An upgrade from 1.2.4 is referenced on that page for at least one target OS platform.

To continue with this example, the current Anaconda distribution includes a much newer version of pandas.

So the questions include:


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

SaibabaBalapur-MSFT commented 1 year ago

@VitalyMCT Thanks for your feedback! We will investigate and update as appropriate.

RamanathanChinnappan-MSFT commented 1 year ago

@VitalyMCT

I've delegated this to content author @MikeRys to review and share his valuable insights.

MikeRys commented 1 year ago

I am sorry but generally I am not really the right person for Synapse questions. In this case, all I can say is that each Spark version that we release in Synapse normally comes with a predefined set of built-in libraries where the chosen library version is fixed to that Spark version. The libraries and their versions are documented as follows:

  1. The Spark versions are documented at https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-version-support.
  2. This page links to more details about each version and their supported libraries. For example, Spark v3.3 is at https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-33-runtime.

Finally, there is documentation about library management and how to add your own libraries at https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries.

bandersmsft commented 4 months ago

Thanks for your dedication to our documentation. Unfortunately, at this time we have been unable to review your issue in a timely manner and we sincerely apologize for the delayed response. We are closing this issue for now, but if you feel that it's still a concern, please respond and let us know. If you determine another possible update to our documentation, please don't hesitate to reach out again. #please-close