Azure / spark-cdm-connector

MIT License
73 stars 30 forks source link

Support Apache Spark 3.0 #57

Closed TissonMathew closed 1 year ago

TissonMathew commented 3 years ago

Spark 3.0 still not supported as of 0.18.1 / public preview.

TissonMathew commented 3 years ago

Any progress on this?

SQLArchitect commented 3 years ago

This is unacceptable.

On Thu, Nov 19, 2020, 6:29 PM Tisson Mathew notifications@github.com wrote:

Any progress on this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-730702110, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJK2MNHEO35J3G73PDJ3SQWS6RANCNFSM4TBPQZQQ .

yueguoguo commented 3 years ago

Is this on the roadmap, and if so is there a timeline for this?

SQLArchitect commented 3 years ago

Lack of support for Spark 3 is not acceptable.

On Wed, Dec 2, 2020, 9:57 AM Zhang Le notifications@github.com wrote:

Is this on the roadmap, and if so is there a timeline for this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-737282409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJKZ6NBPTW4WSALG2DGTSSZIW3ANCNFSM4TBPQZQQ .

bissont commented 3 years ago

I understand the frustration regarding the Spark-CDM-Connector's lack of support for Spark 3.0. We are currently working to add spark 3.0 support to Synapse and HDI. Once we have that done, we can then dedicate time to updating our connectors to Spark 3.0.

TissonMathew commented 3 years ago

Looking forward to seeing this. Also, please consider open sourcing this connector.

TissonMathew commented 3 years ago

Gentle reminder on 3.X support....

TissonMathew commented 3 years ago

Checking on this again...

euangms commented 3 years ago

This relies on support for Spark 3.0 on the server side for testing and this is not done yet.

TissonMathew commented 3 years ago

Thanks for the update @euangms

SQLArchitect commented 3 years ago

When is it going to be done?

On Fri, Jan 22, 2021, 6:59 PM Euan Garden notifications@github.com wrote:

This relies on support for Spark 3.0 on the server side for testing and this is not done yet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-765783010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJK66MG6QGQXHHOS44VTS3IGPZANCNFSM4TBPQZQQ .

euangms commented 3 years ago

No date to share yet

SQLArchitect commented 3 years ago

This is most unnerving and frustrating.

On Sun, Jan 24, 2021, 6:55 PM Euan Garden notifications@github.com wrote:

No date to share yet

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-766463210, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJKYAW3FMG2WGNX2RNLLS3SXNNANCNFSM4TBPQZQQ .

euangms commented 3 years ago

Sorry you feel that way

SQLArchitect commented 3 years ago

It's not just me.

On Mon, Jan 25, 2021, 2:39 PM Euan Garden notifications@github.com wrote:

Sorry you feel that way

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-767064904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJKZUPSSOQ44VZBAW7OLS3XCGRANCNFSM4TBPQZQQ .

shuyan-huang commented 3 years ago

Does any version support Spark 3.0 now?

euangms commented 3 years ago

Not yet, no.

don4of4 commented 3 years ago

Given the proliferation of Dataverse / CDM export to the lake, doesn't it make sense to get this Spark 3.0 compatible even before Synapse supports it? Many customers have health data estates in Azure Databricks, and can push data into Synapse Pools fairly trivially.

absognety commented 3 years ago

Any update on this? we are using spark cdm connector 0.19.1 which still doesn't support Spark 3.0.x

SQLArchitect commented 3 years ago

And Synapse is now supporting Spark 3.x

On Wed, May 26, 2021 at 4:40 AM Vikas Chitturi - Open Source Contributor < @.***> wrote:

Any update on this? we are using spark cdm connector 0.19.1 which still doesn't support Spark 3.0.x

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-848583648, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJKZ356S7V4FY6ICH2DLTPSXWFANCNFSM4TBPQZQQ .

johannes-wagner-itvt commented 3 years ago

Please make this available asap!

akshayabnave commented 3 years ago

When is it going to be available?

TissonMathew commented 2 years ago

ETA on this?

aowens-jmt commented 2 years ago

I, too, would like to know when spark 3.0 support is expected; @billgib @bissont @sricheta92 @euangms can you please help us out over here with some insights? Thank you.

rsradulescu commented 2 years ago

Hi! any notice for spark3?

SQLArchitect commented 2 years ago

It's sad we still are waiting for this

On Tue, Aug 17, 2021, 3:36 PM Rocio Radu @.***> wrote:

Hi! any notice for spark3?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-900576421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJKYKOHBIP7BU5AKQ44TT5K22BANCNFSM4TBPQZQQ .

raymond-au commented 2 years ago

Any news on spark3?

TissonMathew commented 2 years ago

MS team - what's the eta? If this project is not maintained or can't be open sourced, let the community know.

TissonMathew commented 2 years ago

Not yet, no.

Eta?

SQLArchitect commented 2 years ago

It seems this project is being abandoned given the lack of commitment and clarity available.

On Wed, Sep 8, 2021, 10:05 AM Tisson Mathew @.***> wrote:

Not yet, no.

Eta?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-915270176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJK3NKDDAJLF3D3YJ65LUA5UTRANCNFSM4TBPQZQQ .

TissonMathew commented 2 years ago

It seems this project is being abandoned given the lack of commitment and clarity available.

On Wed, Sep 8, 2021, 10:05 AM Tisson Mathew @.***> wrote:

Not yet, no.

Eta?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/spark-cdm-connector/issues/57#issuecomment-915270176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPJK3NKDDAJLF3D3YJ65LUA5UTRANCNFSM4TBPQZQQ .

It takes a quick minute to post an update. No response is bad.

raymond-au commented 2 years ago

I understand the frustration regarding the Spark-CDM-Connector's lack of support for Spark 3.0. We are currently working to add spark 3.0 support to Synapse and HDI. Once we have that done, we can then dedicate time to updating our connectors to Spark 3.0.

Synapse Spark now supports spark3, and it doesn't work with CDM. This library wasn't installed there in Synapse spark3

bissont commented 2 years ago

Hello all,

Thank you for patience. I wanted to update the community with status:

TissonMathew commented 2 years ago

@bissont does it support Databricks Spark 3.X?

bissont commented 2 years ago

The code that is merged here is for Spark 3. For databricks, app registration works, but not credential passthrough. We are still waiting to hear back from Databricks Engineering on a workaround for credential passthrough with Databricks. We also have a port of sastoken authentication coming to spark3 from spark2 (already released n synapse spark 2)

I'll push the spark2 code base to a branch called "spark2" this week.

Credential passthrough doesn't work with Databricks because we added a custom catalog to provide behavior similar to spark2, such that when an entity written when a manifest didn't exist, we want the write to succed.

TissonMathew commented 2 years ago

@bissont what's the ETA on the next release? Thanks for open sourcing the code.

bissont commented 2 years ago

We don't have a good ETA for the next release, but these are the following items:

aowens-jmt commented 2 years ago

Hi,

Any chance this release can happen by the end of the year?

On Dec 31, 2021 Databricks is retiring support for DBR 6.4 and 5.5, which are the only versions still offering Spark 2.x, and will not allow the creation of new clusters with those versions.

carlo-quinonez commented 2 years ago

Can support for Spark 3 be released before schema evolution. Since Databricks is dropping Spark 2 support in a couple of weeks, keeping existing workflows working should be more important than adding features (schema evolution).

Is SAS token support required for Credential-based access control? I ask because we don't use Token-based access control and I'm sure I'm not alone.

Do you think it makes sense to cut a release with Spark 3 support even if it's missing SAS tokens and schema evolution?

avaccariello commented 2 years ago

Hi, I also support the idea of releasing a version compatible with Spark 3 as soon as possible, even without SAS tokens and schema evolution. They can be implemented in a later release. Continuing to keep your pipeline in production must be a priority.

aowens-jmt commented 2 years ago

@bissont is anyone from Microsoft even paying attention to these issues and requests? We're not doing this for fun, our clients and projects rely on these tools to work. We, as developers, expect these tools to evolve with the rest of the platforms for which they're designed. So, what's going on here? The fact that no one from the project team is maintaining contact here, is as disappointing as the lack of progress on this upgrade.

aowens-jmt commented 2 years ago

so for any of you still waiting for 3.0 support, apparently this was released (very quietly for some reason) back in Jan. They've attached a jar which does not yet support pass-through authentication, but the CDM functionality work in 3.0 (and for me, in the latest spark v3.2.1). Test and verify it work for you they way you want, before releasing to prod.

Link to file: https://github.com/Azure/spark-cdm-connector/blob/master/artifacts/spark-cdm-connector-spark3-assembly-databricks-cred-passthrough-not-working-1.19.2.jar

davetheunissen commented 2 years ago

How would I get this to work without passthrough creds? Should I be configuring a SAS token for my storage account?

Has anyone got an example notebook they can share for getting this running in Databricks?

TissonMathew commented 2 years ago

@aowens-jmt are you using Databricks or Synapse?

aowens-jmt commented 2 years ago

I'm using Databricks.

charlie-wilson commented 2 years ago

I'm using Databricks.

Did you ever figure out how to do this with sas token? do you have an example?

TissonMathew commented 2 years ago

All - using this connector have been a terrible experience - not trying to put blame on MS or any of their teams. Their priorities have changed. So we moved on .. 1) We adopted Delta / Lakehouse primarily using Azure Databricks as the data store & compute engine 2) Our metadata store is Cosmos DB (CDM "inspired", use python / c# libraries on a need basis) 3) Adopt Unity catalog and Databricks SQL for BI

charlie-wilson commented 2 years ago

its unfortunate as dynamics 365 exports to CDM. It appears to work with the baked in CDM support on Synapse Spark but it doesn't on Azure DataBricks.

aowens-jmt commented 2 years ago

I'm using Databricks.

Did you ever figure out how to do this with sas token? do you have an example?

I'm not using a SAS token, but using the Service Principal authentication (clientId and clientSecret retrieved from keyVault; secrets scope configured in Databricks). I don't know your specific requirements, but if you are able to use SP-based auth, then I can confirm that it has worked for me.

suraj-shejal commented 2 years ago

Can u share how to read entity with azure databricks