elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 986 forks source link

Spark dependency is not compatible resulting in compile error #2199

Open dimitarKiryakov opened 4 months ago

dimitarKiryakov commented 4 months ago

What kind an issue is this?

Issue description

When adding the following dependency with spark 3.5.0 mvn clean install fails on compile stage:

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-30_2.13</artifactId>
            <version>8.12.1</version>
        </dependency>

We use java 17 for the build.

Steps to reproduce

Code: Pom

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>data-streaming</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <scala.binary.version>2.12</scala.binary.version>
        <spark.version>3.5.0</spark.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-aws</artifactId>
            <version>3.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-30_${scala.binary.version}</artifactId>
            <version>8.12.1</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>

        <resources>
            <resource>
                <directory>resources</directory>
            </resource>
        </resources>
    </build>

</project>
  1. Then add some code that uses: org.apache.spark.sql.Row or org.apache.spark.sql.types.*
  2. Execute mvn clean install and it will fail with a compile error

Excluding spark_catalyst resolves this issue, but still, the dependency should be fixed:

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-30_${scala.binary.version}</artifactId>
            <version>8.12.1</version>
            <exclusions>
                <exclusion>
                    <artifactId>spark-catalyst_2.12</artifactId>
                    <groupId>org.apache.spark</groupId>
                </exclusion>
            </exclusions>
        </dependency>

Version Info

OS: M1 Mac :
JVM : 17 Hadoop/Spark: 3.5.0 ES-Hadoop : 8.12.1 ES : 8.12.1

masseyke commented 4 months ago

Can you paste the compiler error here? Also, I notice that you have a variable called scala.binary.version that is set to 2.12 but you are using elasticsearch-spark-30_2.13 (that last part is the scala version, 2.13). You might run into compatibility problems with scala 2.12 and 2.13.