apache / incubator-graphar

An open source, standard data file format for graph data storage and retrieval.
https://graphar.apache.org/
Apache License 2.0
192 stars 40 forks source link

[Spark] Move datasources under org.apache.spark package #493

Closed SemyonSinchenko closed 3 weeks ago

SemyonSinchenko commented 1 month ago

Describe the enhancement requested

Currently datasources implementation are under org.apache.graphar space. I suggest to move these classes to org.apache.spark package. It allows us, for example, to use spark internals. For example, it resolve the problem in #488 when JSONOptions is private[sql].

Component(s)

Spark, PySpark

acezen commented 1 month ago

There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.

SemyonSinchenko commented 1 month ago

It may also simplify the full support of v2 datasources. Last time when I tried to switch to v2 I faced a problem that few important methods are private[sql]

SemyonSinchenko commented 1 month ago

There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.

But do we need it? We may release graphar itself and we may release PyPI package that already include JARs from datasources. Should datasources be released separately?

acezen commented 1 month ago

There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.

But do we need it? We may release graphar itself and we may release PyPI package that already include JARs from datasources. Should datasources be released separately?

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

SemyonSinchenko commented 1 month ago

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql

acezen commented 1 month ago

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql

It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.

SemyonSinchenko commented 1 month ago

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql

It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.

I can experiment with it. It seems to me that yes, it should be possible.

acezen commented 1 month ago

The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?

It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql

It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.

I can experiment with it. It seems to me that yes, it should be possible.

Great, thanks Sem, Can I assign this issue to you?