delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
6.98k stars 1.6k forks source link

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: delta. Please find packages at http://spark.apache.org/third-party-projects.html #224

Closed sanjiv1980 closed 4 years ago

sanjiv1980 commented 4 years ago

getting this error while executing my FAT jar


    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:245)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
    at com.espn.deltalake.DeltaLakeOverride.main(DeltaLakeOverride.java:42)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: delta.DefaultSource
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
    at scala.util.Try.orElse(Try.scala:84)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)```
marmbrus commented 4 years ago

This looks like your fat jar construction process is dropping the metadata https://github.com/delta-io/delta/blob/master/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister used by the DataSourceRegistrar https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L28-L51. Not sure how you are building your fat jar, but this stackoverflow question https://stackoverflow.com/questions/32887966/shadow-plugin-gradle-what-does-mergeservicefiles-do might be useful.

On Fri, Oct 25, 2019 at 6:50 AM sanjiv kumar notifications@github.com wrote:

getting this error while executing my FAT jar

at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:245) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at com.espn.deltalake.DeltaLakeOverride.main(DeltaLakeOverride.java:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: delta.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)```

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/delta-io/delta/issues/224?email_source=notifications&email_token=AAAAED6CAIK4VUXEEYMLQT3QQL2SNA5CNFSM4JFDUKFKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUMSEPQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAED4JUK3GMKZ27RQZP5LQQL2SNANCNFSM4JFDUKFA .

sanjiv1980 commented 4 years ago

here is my POM file , I tried my best to fix it , but could not


<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.espn.deltalake</groupId>
    <artifactId>espn-deltalake</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>espn-deltalake</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>2.4.4</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>2.4.4</version>
        </dependency>

        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.12</artifactId>
            <version>0.4.0</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>

    </dependencies>

    <build>
        <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/DEPENDENCIES</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/maven/</resource>
                                </transformer>
                                <!-- <transformer -->
                                <!-- implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer"> -->
                                <!-- <resource>META-INF</resource> -->
                                <!-- </transformer> -->

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <manifestEntries>
                                        <Main-Class>com.espn.deltalake.DeltaLakeOverride</Main-Class>
                                        <Build-Number>1.0</Build-Number>
                                    </manifestEntries>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

        </plugins>

    </build>

</project>```
marmbrus commented 4 years ago

Sorry, this is probably a better question for a maven support list, as I am unfortunately not an expert in POM files.

If you can't figure out how to preserve the metadata in your uber jar, you can also use the source by specifying its fully qualified name (org.apache.spark.sql.delta.sources.DeltaDataSource).

Note this is not technically a stable API and could change in the future.

sanjiv1980 commented 4 years ago

I found the work around , we have add below two lines under <transformers></transformers> section

 <transformer                   implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource> META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
                                    </resource> </transformer>
dyadagiri commented 4 years ago

Hi Sanjiv,

I have tried same way... Added delta jar dependency in pom and added transforamtions to shade-plugin, but didn't get any luck. DataSourceRegister filled with below entries :

org.apache.spark.sql.execution.datasources.csv.CSVFileFormat org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider org.apache.spark.sql.execution.datasources.json.JsonFileFormat org.apache.spark.sql.execution.datasources.orc.OrcFileFormat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat org.apache.spark.sql.execution.datasources.text.TextFileFormat org.apache.spark.sql.execution.streaming.ConsoleSinkProvider org.apache.spark.sql.execution.streaming.sources.RateStreamProvider org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider

org.apache.spark.sql.delta.sources.DeltaDataSource

I am getting below error :

java.lang.NoSuchMethodError: org.apache.spark.sql.SparkSession$.active()Lorg/apache/spark/sql/SparkSession;
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
com.google.common.cache.LocalCache.get(LocalCache.java:4000)
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740)
org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:702)

Have you faced similar issue while doing integration, Please let me know if I missed any configuration.

sanjiv1980 commented 4 years ago

I have a solution for that, let me provide u. Give me sometime.

On Wed, 12 Feb 2020, 12:27 dyadagiri, notifications@github.com wrote:

Hi Sanjiv,

I have tried same way... Added delta jar dependency in pom and added transforamtions to shade-plugin, but didn't get any luck. DataSourceRegister filled with below entries :

org.apache.spark.sql.execution.datasources.csv.CSVFileFormat org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider org.apache.spark.sql.execution.datasources.json.JsonFileFormat org.apache.spark.sql.execution.datasources.orc.OrcFileFormat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat org.apache.spark.sql.execution.datasources.text.TextFileFormat org.apache.spark.sql.execution.streaming.ConsoleSinkProvider org.apache.spark.sql.execution.streaming.sources.RateStreamProvider org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider

org.apache.spark.sql.delta.sources.DeltaDataSource

I am getting below error :

java.lang.NoSuchMethodError: org.apache.spark.sql.SparkSession$.active()Lorg/apache/spark/sql/SparkSession; com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261) com.google.common.cache.LocalCache.get(LocalCache.java:4000) com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789) org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740) org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:702)

Have you faced similar issue while doing integration, Please let me know if I missed any configuration.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/delta-io/delta/issues/224?email_source=notifications&email_token=AAUPUWF5OTCTLSS6VXAK6S3RCOMVBA5CNFSM4JFDUKFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELPUVZI#issuecomment-585059045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUPUWGXX7L42BTCMSMRACTRCOMVBANCNFSM4JFDUKFA .

dyadagiri commented 4 years ago

Thank you.

sanjiv1980 commented 4 years ago

Hi, Have added below transformers.


<transformer                   implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource> META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
                                    </resource> </transformer>

You can add filters parallel to transforms tag

like below

<filters>
                                <filter>
                                    <artifact>junit:junit</artifact>
                                    <includes>
                                        <include>**</include>
                                    </includes>
                                    <excludes>
                                        <exclude>**</exclude>
                                    </excludes>
                                </filter>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                        <exclude>core-default.xml</exclude>
                                        <exclude>hdfs-default.xml</exclude>
                                        <exclude>mapred-default.xml</exclude>
                                        <exclude>yarn-default.xml</exclude>
                                    </excludes>
                                </filter>
                            </filters>

Note : can you try with


<spark.version>2.4.4</spark.version>
<delta-lake-version>0.3.0</delta-lake-version>

<dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.11</artifactId>
            <version>${delta-lake-version}</version>
        </dependency>

Let me know if it worked , else I can give you my POM.xml 
dyadagiri commented 4 years ago

Hi Rajiv,

Still facing same issue, Coud you please shre your pom.xml file. its failing at below statement

 val data = spark.range(0, 5)
 data.write.format("delta").save("/user/yadagiri/deltapath/")   //unable to write data in delta format

Please check my pom.xml

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>1.0.0</modelVersion>
<groupId>com.yadagiri.spark</groupId>
<artifactId>proje</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>yadagiri</name>

<properties>
    <scala.tools.version>2.11</scala.tools.version>
    <scala.version>2.11.11</scala.version>
    <spark.version>2.4.4</spark.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.tools.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.tools.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>io.delta</groupId>
        <artifactId>delta-core_${scala.tools.version}</artifactId>
        <version>0.3.0</version>
    </dependency>

    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest_${scala.tools.version}</artifactId>
        <version>3.0.3</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>com.holdenkarau</groupId>
        <artifactId>spark-testing-base_${scala.tools.version}</artifactId>
        <version>2.4.3_0.12.0</version>
        <scope>test</scope>
    </dependency>

</dependencies>

<build>
    <finalName>yadagiri</finalName>

    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
        <plugin>
            <groupId>org.scala-tools</groupId>
            <artifactId>maven-scala-plugin</artifactId>
            <executions>
                <execution>
                    <id>compile</id>
                    <goals>
                        <goal>compile</goal>
                    </goals>
                    <phase>compile</phase>
                </execution>
                <execution>
                    <id>test-compile</id>
                    <goals>
                        <goal>testCompile</goal>
                    </goals>
                    <phase>test-compile</phase>
                </execution>
                <execution>
                    <phase>process-resources</phase>
                    <goals>
                        <goal>compile</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>build-helper-maven-plugin</artifactId>
            <version>3.0.0</version>
            <executions>
                <execution>
                    <id>add-source</id>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>add-source</goal>
                    </goals>
                    <configuration>
                        <sources>
                            <source>src/main/scala</source>
                            <source>src/main/java</source>
                        </sources>
                    </configuration>
                </execution>
                <execution>
                    <id>add-test-source</id>
                    <phase>generate-test-sources</phase>
                    <goals>
                        <goal>add-test-source</goal>
                    </goals>
                    <configuration>
                        <sources>
                            <source>src/test/scala</source>
                            <source>src/test/java</source>
                        </sources>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.1</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <filters>
                    <filter>
                        <artifact>junit:junit</artifact>
                        <includes>
                            <include>**</include>
                        </includes>
                        <excludes>
                            <exclude>**</exclude>
                        </excludes>
                    </filter>
                    <filter>
                        <artifact>*:*</artifact>
                        <excludes>
                            <exclude>META-INF/*.SF</exclude>
                            <exclude>META-INF/*.DSA</exclude>
                            <exclude>META-INF/*.RSA</exclude>
                            <exclude>core-default.xml</exclude>
                            <exclude>hdfs-default.xml</exclude>
                            <exclude>mapred-default.xml</exclude>
                            <exclude>yarn-default.xml</exclude>
                        </excludes>
                    </filter>
                </filters>
                <transformers>
                    <transformer
                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                        <resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource>
                    </transformer>
                </transformers>
            </configuration>
        </plugin>
    </plugins>
</build>

`

sanjiv1980 commented 4 years ago

please add this this line in transformer


<transformer                   implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                   <resource>META-INF/services/</resource>
                               </transformer>

because the data are getting overridden inside service .
dyadagiri commented 4 years ago

I have added, but no change in error. And sameway META-INF/services/org.apache.spark.sql.sources.DataSourceRegister file updated

org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider
org.apache.spark.sql.execution.datasources.json.JsonFileFormat
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
org.apache.spark.sql.execution.datasources.text.TextFileFormat
org.apache.spark.sql.execution.streaming.ConsoleSinkProvider
org.apache.spark.sql.execution.streaming.sources.RateStreamProvider
org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider

org.apache.spark.sql.delta.sources.DeltaDataSource

Here empty line appended to file before deltadatasource.

sanjiv1980 commented 4 years ago

let me try here , I will update you .

On Thu, Feb 13, 2020 at 2:30 PM dyadagiri notifications@github.com wrote:

I have added, but no change in error. And sameway META-INF/services/org.apache.spark.sql.sources.DataSourceRegister file updated

org.apache.spark.sql.execution.datasources.csv.CSVFileFormat org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider org.apache.spark.sql.execution.datasources.json.JsonFileFormat org.apache.spark.sql.execution.datasources.orc.OrcFileFormat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat org.apache.spark.sql.execution.datasources.text.TextFileFormat org.apache.spark.sql.execution.streaming.ConsoleSinkProvider org.apache.spark.sql.execution.streaming.sources.RateStreamProvider org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider

org.apache.spark.sql.delta.sources.DeltaDataSource

Here empty line appended to file before deltadatasource.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/delta-io/delta/issues/224?email_source=notifications&email_token=AAUPUWCSNJGYGDZIDIZXCDTRCUDZ5A5CNFSM4JFDUKFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELT543Y#issuecomment-585621103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUPUWH22NVCMQIQ7X7GJCLRCUDZ5ANCNFSM4JFDUKFA .

-- Thanks & Regard sanjiv Singh

dyadagiri commented 4 years ago

Hi Sanjiv, any luck ? could you please share your pom.xml, I will try with that.

sanjiv1980 commented 4 years ago

Hi , Can you try the below pom.xml


<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.espn.deltalake</groupId>
    <artifactId>espn-deltalake</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>espn-deltalake</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>

        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.11</artifactId>
            <version>0.4.0</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>

    </dependencies>

    <build>
        <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                        <exclude>META-INF/maven/**</exclude>
                                        <exclude>LICENSE</exclude>
                                        <exclude>META-INF/license/**</exclude>
                                    </excludes>
                                </filter>
                                <!-- <filter> <artifact>org.apache.spark:spark-sql_2.11</artifact> 
                                    <excludes> <exclude>META-INF/*</exclude> </excludes> </filter> -->
                            </filters>

                            <!-- <relocations> <relocation> <pattern>com</pattern> <shadedPattern>repackaged.com</shadedPattern> 
                                <includes> <include>com.google.protobuf.**</include> <include>com.google.common.**</include> 
                                </includes> </relocation> </relocations> -->
                            <transformers>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/</resource>
                                </transformer>
                                <!-- <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> 
                                    <resource>META-INF/DEPENDENCIES</resource> </transformer> -->
                                <!-- <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> 
                                    <resource>META-INF/maven/</resource> </transformer> -->
                                <!-- <transformer implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer"> 
                                    <resource>META-INF</resource> <file>org.apache.spark.sql.sources.DataSourceRegister</file> 
                                    </transformer> -->
                                <!-- <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> 
                                    <resource> META-INF/services/org.apache.spark.sql.sources.DataSourceRegister 
                                    </resource> </transformer> -->
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.eclipse.jetty.http.HttpFieldPreEncoder</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.io.compress.CompressionCodec</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.security.alias.CredentialProviderFactory</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.security.SecurityInfo</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>common-version-info.properties</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>core-default.xml</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>org.apache.hadoop.application-classloader.properties</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>digesterRules.xml</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>hdfs-default.xml</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.security.token.TokenIdentifier</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.security.token.TokenRenewer</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/javax.xml.datatype.DatatypeFactory</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/javax.xml.parsers.DocumentBuilderFactory</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/javax.xml.parsers.SAXParserFactory</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/javax.xml.validation.SchemaFactory</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.w3c.dom.DOMImplementationSourceList</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.xml.sax.driver</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.security.token.TokenIdentifier</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>mapred-default.xml</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>yarn-default.xml</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>yarn-version-info.properties</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/native/linux32/libleveldbjni.so</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/native/linux64/libleveldbjni.so</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/native/osx/libleveldbjni.jnilib</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/native/windows32/leveldbjni.dll</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/native/windows64/leveldbjni.dll</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/com.fasterxml.jackson.core.JsonFactory</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.commons.logging.LogFactory</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.glassfish.hk2.extension.ServiceLocatorGenerator</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/javax.ws.rs.ext.RuntimeDelegate</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.glassfish.jersey.internal.spi.AutoDiscoverable</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.glassfish.jersey.internal.spi.ForcedAutoDiscoverable</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/javax.servlet.ServletContainerInitializer</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.glassfish.jersey.servlet.spi.AsyncContextDelegateProvider</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.glassfish.jersey.servlet.spi.FilterUrlMappingsProvider</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/native/libnetty_transport_native_epoll_x86_64.so</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/native/libnetty_transport_native_kqueue_x86_64.jnilib</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/com.fasterxml.jackson.core.ObjectCodec</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/com.fasterxml.jackson.databind.Module</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.spark.status.AppHistoryServerPlugin</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>scala-parser-combinators.properties</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>org.codehaus.commons.compiler.properties</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.orc.DataMask$Provider</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>parquet.thrift</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/sun.net.spi.nameservice.NameServiceDescriptor</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.eclipse.jetty.http.HttpFieldPreEncoder</resource>
                                </transformer>

                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <manifestEntries>
                                        <Main-Class>com.espn.deltalake.DeltaLakeOverride</Main-Class>
                                        <Build-Number>1.0</Build-Number>
                                    </manifestEntries>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <!-- <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-dependency-plugin</artifactId> 
                <executions> <execution> <id>copy-dependencies</id> <phase>prepare-package</phase> 
                <goals> <goal>copy-dependencies</goal> </goals> <configuration> <outputDirectory>${project.build.directory}/classes/lib</outputDirectory> 
                <overWriteReleases>false</overWriteReleases> <overWriteSnapshots>false</overWriteSnapshots> 
                <overWriteIfNewer>true</overWriteIfNewer> </configuration> </execution> </executions> 
                </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> 
                <configuration> <archive> <manifest> <addClasspath>true</addClasspath> <classpathPrefix>lib/</classpathPrefix> 
                <mainClass>com.espn.deltalake.DeltaLakeOverride</mainClass> </manifest> </archive> 
                </configuration> </plugin> -->

        </plugins>

    </build>

</project>
sanjiv1980 commented 4 years ago

Let me know , if it works , I will make it shorter .

sanjiv1980 commented 4 years ago

@dyadagiri your problem has solved .?

fangz5 commented 4 years ago

Sorry, this is probably a better question for a maven support list, as I am unfortunately not an expert in POM files.

If you can't figure out how to preserve the metadata in your uber jar, you can also use the source by specifying its fully qualified name (org.apache.spark.sql.delta.sources.DeltaDataSource).

Note this is not technically a stable API and could change in the future.

Had the same issue in my sbt project. The full qualified name is quick workaround.

kkhisamov-peernova commented 4 years ago

@dyadagiri have you solved your problem? I have the same issue. I mean java.lang.NoSuchMethodError: org.apache.spark.sql.SparkSession$.active()Lorg/apache/spark/sql/SparkSession exception

Vishvin95 commented 3 years ago

@marmbrus @fangz5 The solution of providing fully qualified name of the source works when I want to read it using spark.read.format, but if we want to use DeltaTable.forPath(), internally it is reading through spark.read.format("delta"), so this blocks from using DeltaTable API. How can I resolve this? I see that META-INF/services is already getting included in my JAR, so as you said, it shouldn't be any problem, but it still gives the same error. Please help on urgent basis.

AnaMariaIlie commented 3 years ago

@marmbrus @fangz5 The solution of providing fully qualified name of the source works when I want to read it using spark.read.format, but if we want to use DeltaTable.forPath(), internally it is reading through spark.read.format("delta"), so this blocks from using DeltaTable API. How can I resolve this? I see that META-INF/services is already getting included in my JAR, so as you said, it shouldn't be any problem, but it still gives the same error. Please help on urgent basis.

I'm having the same problem. Have you found a solution?

Vishvin95 commented 3 years ago

@AnaMariaIlie Yes, I realized that to make it work correctly, we must have META-INF/services/org.apache.spark.sql.sources.DataSourceRegister included in the JAR. A brief background, typically this services folder is used for keeping runtime configurations. Now, if we have other data sources which reference similarly named file, may be EventHubs, etc., then we need to check the MergeStrategy(in SBT) or what above answers are referring as "Transformers" in Maven. So, we need to merge all such named files and include it in our JAR as a single file. The piece of code from my build.sbt, which I used to solve this.

assemblyMergeStrategy in assembly := { case PathList("META-INF","services", xg @ _*) => MergeStrategy.concat case PathList("META-INF", xs @ _*) => MergeStrategy.discard case x => MergeStrategy.first }

After this, when we open the JAR, we can see all the files with this name, merged into a single file with content concatenated. Something like this for me: org.apache.spark.sql.eventhubs.EventHubsSourceProvider org.apache.spark.sql.delta.sources.DeltaDataSource  org.apache.spark.sql.v2.avro.AvroDataSourceV2  org.apache.spark.sql.kafka010.KafkaSourceProvider

Now, you can use simply, spark.read.format("delta") or DeltaTable.forPath()

Hope this helps!

AnaMariaIlie commented 3 years ago

@AnaMariaIlie Yes, I realized that to make it work correctly, we must have META-INF/services/org.apache.spark.sql.sources.DataSourceRegister included in the JAR. A brief background, typically this services folder is used for keeping runtime configurations. Now, if we have other data sources which reference similarly named file, may be EventHubs, etc., then we need to check the MergeStrategy(in SBT) or what above answers are referring as "Transformers" in Maven. So, we need to merge all such named files and include it in our JAR as a single file. The piece of code from my build.sbt, which I used to solve this.

assemblyMergeStrategy in assembly := { case PathList("META-INF","services", xg @ _*) => MergeStrategy.concat case PathList("META-INF", xs @ _*) => MergeStrategy.discard case x => MergeStrategy.first }

After this, when we open the JAR, we can see all the files with this name, merged into a single file with content concatenated. Something like this for me: org.apache.spark.sql.eventhubs.EventHubsSourceProvider org.apache.spark.sql.delta.sources.DeltaDataSource  org.apache.spark.sql.v2.avro.AvroDataSourceV2  org.apache.spark.sql.kafka010.KafkaSourceProvider

Now, you can use simply, spark.read.format("delta") or DeltaTable.forPath()

Hope this helps!

Thank you for the quick answer! Indeed, that is the problem, but I'm using maven and I still can't concatenate them :( It picks just KafkaSourceProvider

Vishvin95 commented 3 years ago

@AnaMariaIlie Check this out: https://mvnrepository.com/artifact/org.zcore.maven/merge-maven-plugin

AnaMariaIlie commented 3 years ago

@AnaMariaIlie Check this out: https://mvnrepository.com/artifact/org.zcore.maven/merge-maven-plugin

@Vishvin95 Thank you! My problem was that I was using both maven-shade and maven-assembly plugins. Removing maven-assembly plugin solved it!

nathan-bennett commented 2 years ago

I have ran into a similar issue when using gradle. Anyone know how to concat the meta-inf files when using gradle?

I tried this however it just caused another failure: shadowJar { mergeServiceFiles() }

nathan-bennett commented 2 years ago

FYI - I solved this problem when using gradle:

    shadowJar {
        mergeServiceFiles("META-INF/services/org.apache.spark.sql.sources.DataSourceRegister")
    }