Closed SemyonSinchenko closed 3 weeks ago
There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.
It may also simplify the full support of v2
datasources. Last time when I tried to switch to v2
I faced a problem that few important methods are private[sql]
There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.
But do we need it? We may release graphar itself and we may release PyPI package that already include JARs from datasources. Should datasources be released separately?
There is a problem if we move to org.apache.spark is that we can not release the datasource maven packages to the org.apache.spark in the future.
But do we need it? We may release graphar itself and we may release PyPI package that already include JARs from datasources. Should datasources be released separately?
The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?
The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?
It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql
The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?
It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql
It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.
The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?
It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql
It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.
I can experiment with it. It seems to me that yes, it should be possible.
The graphar package relys on datasource, I don't know how can we release graphar without release the datasource package?
It will be a compile time dependency. For example, like in delta-spark: https://github.com/delta-io/delta/tree/master/spark/src/main/scala/org/apache/spark/sql
It seems that as a compile time dependency, we need to put datasource back to maven-projects/graphar? Or do you know is there a way that datasource still be separated from graphar and can be a compile time dependency too? That would be perfect for this issue.
I can experiment with it. It seems to me that yes, it should be possible.
Great, thanks Sem, Can I assign this issue to you?
Describe the enhancement requested
Currently datasources implementation are under
org.apache.graphar
space. I suggest to move these classes toorg.apache.spark
package. It allows us, for example, to use spark internals. For example, it resolve the problem in #488 whenJSONOptions
isprivate[sql]
.Component(s)
Spark, PySpark