Closed yecol closed 3 months ago
@yecol Our target is to support multiple versions of Apache Spark. Unfortunately, the DataSource
API of Apache Spark is a Developer API and changing dramatically from one version of spark to another. And sometimes changes are so big, that reflection is not enough.
We made a decision to separate datasource implementation into a maven subpackage.
And we have the following maven profiles:
<profiles>
<profile>
<id>datasources-32</id>
<properties>
<sbt.project.name>graphar</sbt.project.name>
<spark.version>3.2.4</spark.version>
</properties>
<modules>
<module>graphar</module>
<module>datasources-32</module>
</modules>
</profile>
<profile>
<id>datasources-33</id>
<properties>
<sbt.project.name>graphar</sbt.project.name>
<spark.version>3.3.4</spark.version>
</properties>
<modules>
<module>graphar</module>
<module>datasources-33</module>
</modules>
</profile>
<profile>
<id>datasources-34</id>
<properties>
<sbt.project.name>graphar</sbt.project.name>
<spark.version>3.4.3</spark.version>
</properties>
<modules>
<module>graphar</module>
<module>datasources-34</module>
</modules>
</profile>
<profile>
<id>datasources-35</id>
<properties>
<sbt.project.name>graphar</sbt.project.name>
<spark.version>3.5.1</spark.version>
</properties>
<modules>
<module>graphar</module>
<module>datasources-35</module>
</modules>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
</profile>
</profiles>
Each of subfolders is actually a subproject in Maven.
so, using that approach we are able to build GraphAr Spark for a different version of spark itself.
At the moment, that approach is used in our CI when we are running tests for all the supported Maven profiles.
An alternative way to provide the support is to use tags/branches. But for me it is better to have Meven sub-projects. At the random moment of time about 4-5 versions of spark are maintained, so I don't think that amount of duplicated code will grow infinitely: spark-3.2 is EoL soon, for example, so we can drop it, etc.
I see. It makes sense! I didn't aware the diverged datasource versions of Spark. Thanks for your kindly and detailed response!
Describe the bug, including details regarding any error messages, version, and platform.
Just check out the latest code and find something weird. As shown in the above figure, it seems some code is under dir
datasources-34
anddatasources-35
?maven-projects/spark/src/main
if they are library code?maven-projects/spark/src/test
if they are test specific code?Component(s)
Spark