weird source code structure (maybe bug?)

yecol commented 3 months ago

Describe the bug, including details regarding any error messages, version, and platform.

Just check out the latest code and find something weird. As shown in the above figure, it seems some code is under dir datasources-34 and datasources-35?

should they be in the maven-projects/spark/src/main if they are library code?
or should be in maven-projects/spark/src/test if they are test specific code?

Component(s)

Spark

SemyonSinchenko commented 3 months ago

@yecol Our target is to support multiple versions of Apache Spark. Unfortunately, the DataSource API of Apache Spark is a Developer API and changing dramatically from one version of spark to another. And sometimes changes are so big, that reflection is not enough.

We made a decision to separate datasource implementation into a maven subpackage.

And we have the following maven profiles:

    <profiles>
        <profile>
            <id>datasources-32</id>
            <properties>
                <sbt.project.name>graphar</sbt.project.name>
                <spark.version>3.2.4</spark.version>
            </properties>
            <modules>
                <module>graphar</module>
                <module>datasources-32</module>
            </modules>
        </profile>
        <profile>
            <id>datasources-33</id>
            <properties>
                <sbt.project.name>graphar</sbt.project.name>
                <spark.version>3.3.4</spark.version>
            </properties>
            <modules>
                <module>graphar</module>
                <module>datasources-33</module>
            </modules>
        </profile>
        <profile>
            <id>datasources-34</id>
            <properties>
                <sbt.project.name>graphar</sbt.project.name>
                <spark.version>3.4.3</spark.version>
            </properties>
            <modules>
                <module>graphar</module>
                <module>datasources-34</module>
            </modules>
        </profile>
        <profile>
            <id>datasources-35</id>
            <properties>
                <sbt.project.name>graphar</sbt.project.name>
                <spark.version>3.5.1</spark.version>
            </properties>
            <modules>
                <module>graphar</module>
                <module>datasources-35</module>
            </modules>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
        </profile>
    </profiles>

Each of subfolders is actually a subproject in Maven.

so, using that approach we are able to build GraphAr Spark for a different version of spark itself.

At the moment, that approach is used in our CI when we are running tests for all the supported Maven profiles.

SemyonSinchenko commented 3 months ago

An alternative way to provide the support is to use tags/branches. But for me it is better to have Meven sub-projects. At the random moment of time about 4-5 versions of spark are maintained, so I don't think that amount of duplicated code will grow infinitely: spark-3.2 is EoL soon, for example, so we can drop it, etc.

yecol commented 3 months ago

I see. It makes sense! I didn't aware the diverged datasource versions of Spark. Thanks for your kindly and detailed response!

apache / incubator-graphar

weird source code structure (maybe bug?) #593

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)