apache / incubator-graphar

An open source, standard data file format for graph data storage and retrieval.
https://graphar.apache.org/
Apache License 2.0
192 stars 40 forks source link

feat(spark): Refactoring datasources #514

Closed SemyonSinchenko closed 3 weeks ago

SemyonSinchenko commented 3 weeks ago

Reason for this PR

By moving datasources under org.apache.spark.sql we are able to access private Spark API. Last time when I was trying to fully migrate datasources to V2 it was a blocker. Detailed motivation is in #493

What changes are included in this PR?

Mostly refactoring.

Are these changes tested?

Unit tests are passed

I manually checked the generated JARs: image

Are there any user-facing changes?

Mostly not because GarDataSource was left under the same package.

Close #493

SemyonSinchenko commented 3 weeks ago

@acezen What do you think about this? For me it looks like there are no changes for public API

acezen commented 3 weeks ago

@acezen What do you think about this? For me it looks like there are no changes for public API

Hi, Sem, thanks for the draft, change to org.apache.spark.sql is good to me. But if we upload the binary package to maven, is that possible to upload all the binary packages to org.apache.graphar?

SemyonSinchenko commented 3 weeks ago

@acezen What do you think about this? For me it looks like there are no changes for public API

Hi, Sem, thanks for the draft, change to org.apache.spark.sql is good to me. But if we upload the binary package to maven, is that possible to upload all the binary packages to org.apache.graphar?

Yes, it should be. For example, check this:

  1. Download delta-saprk: https://mvnrepository.com/artifact/io.delta/delta-spark_2.13/3.2.0
  2. Open it with archive manager
  3. You will see org-apache-spark classes inside: image

but coordinates are still io.delta:delta-spark_2.13:3.2.0

So, if you OK with it I will update 32 too and fix license headers.

acezen commented 3 weeks ago

@acezen What do you think about this? For me it looks like there are no changes for public API

Hi, Sem, thanks for the draft, change to org.apache.spark.sql is good to me. But if we upload the binary package to maven, is that possible to upload all the binary packages to org.apache.graphar?

Yes, it should be. For example, check this:

  1. Download delta-saprk: https://mvnrepository.com/artifact/io.delta/delta-spark_2.13/3.2.0
  2. Open it with archive manager
  3. You will see org-apache-spark classes inside: image

but coordinates are still io.delta:delta-spark_2.13:3.2.0

So, if you OK with it I will update 32 too and fix license headers.

Looks NICE! The change is OK for me. Thanks for the work.

SemyonSinchenko commented 3 weeks ago

Ready for review.