Closed sanjiv1980 closed 4 years ago
This looks like your fat jar construction process is dropping the metadata https://github.com/delta-io/delta/blob/master/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister used by the DataSourceRegistrar https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L28-L51. Not sure how you are building your fat jar, but this stackoverflow question https://stackoverflow.com/questions/32887966/shadow-plugin-gradle-what-does-mergeservicefiles-do might be useful.
On Fri, Oct 25, 2019 at 6:50 AM sanjiv kumar notifications@github.com wrote:
getting this error while executing my FAT jar
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:245) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at com.espn.deltalake.DeltaLakeOverride.main(DeltaLakeOverride.java:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: delta.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)```
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/delta-io/delta/issues/224?email_source=notifications&email_token=AAAAED6CAIK4VUXEEYMLQT3QQL2SNA5CNFSM4JFDUKFKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUMSEPQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAED4JUK3GMKZ27RQZP5LQQL2SNANCNFSM4JFDUKFA .
here is my POM file , I tried my best to fix it , but could not
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.espn.deltalake</groupId>
<artifactId>espn-deltalake</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>espn-deltalake</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>0.4.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/DEPENDENCIES</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/maven/</resource>
</transformer>
<!-- <transformer -->
<!-- implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer"> -->
<!-- <resource>META-INF</resource> -->
<!-- </transformer> -->
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>com.espn.deltalake.DeltaLakeOverride</Main-Class>
<Build-Number>1.0</Build-Number>
</manifestEntries>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>```
Sorry, this is probably a better question for a maven support list, as I am unfortunately not an expert in POM files.
If you can't figure out how to preserve the metadata in your uber jar, you can also use the source by specifying its fully qualified name (org.apache.spark.sql.delta.sources.DeltaDataSource
).
Note this is not technically a stable API and could change in the future.
I found the work around , we have add below two lines under <transformers></transformers>
section
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource> META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
</resource> </transformer>
Hi Sanjiv,
I have tried same way... Added delta jar dependency in pom and added transforamtions to shade-plugin, but didn't get any luck. DataSourceRegister filled with below entries :
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider org.apache.spark.sql.execution.datasources.json.JsonFileFormat org.apache.spark.sql.execution.datasources.orc.OrcFileFormat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat org.apache.spark.sql.execution.datasources.text.TextFileFormat org.apache.spark.sql.execution.streaming.ConsoleSinkProvider org.apache.spark.sql.execution.streaming.sources.RateStreamProvider org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider
org.apache.spark.sql.delta.sources.DeltaDataSource
I am getting below error :
java.lang.NoSuchMethodError: org.apache.spark.sql.SparkSession$.active()Lorg/apache/spark/sql/SparkSession;
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
com.google.common.cache.LocalCache.get(LocalCache.java:4000)
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740)
org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:702)
Have you faced similar issue while doing integration, Please let me know if I missed any configuration.
I have a solution for that, let me provide u. Give me sometime.
On Wed, 12 Feb 2020, 12:27 dyadagiri, notifications@github.com wrote:
Hi Sanjiv,
I have tried same way... Added delta jar dependency in pom and added transforamtions to shade-plugin, but didn't get any luck. DataSourceRegister filled with below entries :
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider org.apache.spark.sql.execution.datasources.json.JsonFileFormat org.apache.spark.sql.execution.datasources.orc.OrcFileFormat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat org.apache.spark.sql.execution.datasources.text.TextFileFormat org.apache.spark.sql.execution.streaming.ConsoleSinkProvider org.apache.spark.sql.execution.streaming.sources.RateStreamProvider org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider
org.apache.spark.sql.delta.sources.DeltaDataSource
I am getting below error :
java.lang.NoSuchMethodError: org.apache.spark.sql.SparkSession$.active()Lorg/apache/spark/sql/SparkSession; com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261) com.google.common.cache.LocalCache.get(LocalCache.java:4000) com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789) org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740) org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:702)
Have you faced similar issue while doing integration, Please let me know if I missed any configuration.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/delta-io/delta/issues/224?email_source=notifications&email_token=AAUPUWF5OTCTLSS6VXAK6S3RCOMVBA5CNFSM4JFDUKFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELPUVZI#issuecomment-585059045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUPUWGXX7L42BTCMSMRACTRCOMVBANCNFSM4JFDUKFA .
Thank you.
Hi, Have added below transformers.
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource> META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
</resource> </transformer>
You can add filters parallel to transforms tag
like below
<filters>
<filter>
<artifact>junit:junit</artifact>
<includes>
<include>**</include>
</includes>
<excludes>
<exclude>**</exclude>
</excludes>
</filter>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>core-default.xml</exclude>
<exclude>hdfs-default.xml</exclude>
<exclude>mapred-default.xml</exclude>
<exclude>yarn-default.xml</exclude>
</excludes>
</filter>
</filters>
Note : can you try with
<spark.version>2.4.4</spark.version>
<delta-lake-version>0.3.0</delta-lake-version>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.11</artifactId>
<version>${delta-lake-version}</version>
</dependency>
Let me know if it worked , else I can give you my POM.xml
Hi Rajiv,
Still facing same issue, Coud you please shre your pom.xml file. its failing at below statement
val data = spark.range(0, 5)
data.write.format("delta").save("/user/yadagiri/deltapath/") //unable to write data in delta format
Please check my pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>1.0.0</modelVersion>
<groupId>com.yadagiri.spark</groupId>
<artifactId>proje</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>yadagiri</name>
<properties>
<scala.tools.version>2.11</scala.tools.version>
<scala.version>2.11.11</scala.version>
<spark.version>2.4.4</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_${scala.tools.version}</artifactId>
<version>0.3.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.tools.version}</artifactId>
<version>3.0.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.holdenkarau</groupId>
<artifactId>spark-testing-base_${scala.tools.version}</artifactId>
<version>2.4.3_0.12.0</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<finalName>yadagiri</finalName>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<id>add-source</id>
<phase>generate-sources</phase>
<goals>
<goal>add-source</goal>
</goals>
<configuration>
<sources>
<source>src/main/scala</source>
<source>src/main/java</source>
</sources>
</configuration>
</execution>
<execution>
<id>add-test-source</id>
<phase>generate-test-sources</phase>
<goals>
<goal>add-test-source</goal>
</goals>
<configuration>
<sources>
<source>src/test/scala</source>
<source>src/test/java</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<filters>
<filter>
<artifact>junit:junit</artifact>
<includes>
<include>**</include>
</includes>
<excludes>
<exclude>**</exclude>
</excludes>
</filter>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>core-default.xml</exclude>
<exclude>hdfs-default.xml</exclude>
<exclude>mapred-default.xml</exclude>
<exclude>yarn-default.xml</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource>
</transformer>
</transformers>
</configuration>
</plugin>
</plugins>
</build>
`
please add this this line in transformer
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/</resource>
</transformer>
because the data are getting overridden inside service .
I have added, but no change in error. And sameway META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
file updated
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider
org.apache.spark.sql.execution.datasources.json.JsonFileFormat
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
org.apache.spark.sql.execution.datasources.text.TextFileFormat
org.apache.spark.sql.execution.streaming.ConsoleSinkProvider
org.apache.spark.sql.execution.streaming.sources.RateStreamProvider
org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider
org.apache.spark.sql.delta.sources.DeltaDataSource
Here empty line appended to file before deltadatasource.
let me try here , I will update you .
On Thu, Feb 13, 2020 at 2:30 PM dyadagiri notifications@github.com wrote:
I have added, but no change in error. And sameway META-INF/services/org.apache.spark.sql.sources.DataSourceRegister file updated
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider org.apache.spark.sql.execution.datasources.json.JsonFileFormat org.apache.spark.sql.execution.datasources.orc.OrcFileFormat org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat org.apache.spark.sql.execution.datasources.text.TextFileFormat org.apache.spark.sql.execution.streaming.ConsoleSinkProvider org.apache.spark.sql.execution.streaming.sources.RateStreamProvider org.apache.spark.sql.execution.streaming.sources.TextSocketSourceProvider
org.apache.spark.sql.delta.sources.DeltaDataSource
Here empty line appended to file before deltadatasource.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/delta-io/delta/issues/224?email_source=notifications&email_token=AAUPUWCSNJGYGDZIDIZXCDTRCUDZ5A5CNFSM4JFDUKFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELT543Y#issuecomment-585621103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUPUWH22NVCMQIQ7X7GJCLRCUDZ5ANCNFSM4JFDUKFA .
-- Thanks & Regard sanjiv Singh
Hi Sanjiv, any luck ? could you please share your pom.xml, I will try with that.
Hi , Can you try the below pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.espn.deltalake</groupId>
<artifactId>espn-deltalake</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>espn-deltalake</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.11</artifactId>
<version>0.4.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/maven/**</exclude>
<exclude>LICENSE</exclude>
<exclude>META-INF/license/**</exclude>
</excludes>
</filter>
<!-- <filter> <artifact>org.apache.spark:spark-sql_2.11</artifact>
<excludes> <exclude>META-INF/*</exclude> </excludes> </filter> -->
</filters>
<!-- <relocations> <relocation> <pattern>com</pattern> <shadedPattern>repackaged.com</shadedPattern>
<includes> <include>com.google.protobuf.**</include> <include>com.google.common.**</include>
</includes> </relocation> </relocations> -->
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/</resource>
</transformer>
<!-- <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/DEPENDENCIES</resource> </transformer> -->
<!-- <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/maven/</resource> </transformer> -->
<!-- <transformer implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
<resource>META-INF</resource> <file>org.apache.spark.sql.sources.DataSourceRegister</file>
</transformer> -->
<!-- <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource> META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
</resource> </transformer> -->
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.eclipse.jetty.http.HttpFieldPreEncoder</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.io.compress.CompressionCodec</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.security.alias.CredentialProviderFactory</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.security.SecurityInfo</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>common-version-info.properties</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>core-default.xml</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>org.apache.hadoop.application-classloader.properties</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>digesterRules.xml</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>hdfs-default.xml</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.security.token.TokenIdentifier</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.security.token.TokenRenewer</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/javax.xml.datatype.DatatypeFactory</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/javax.xml.parsers.DocumentBuilderFactory</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/javax.xml.parsers.SAXParserFactory</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/javax.xml.validation.SchemaFactory</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.w3c.dom.DOMImplementationSourceList</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.xml.sax.driver</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.security.token.TokenIdentifier</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>mapred-default.xml</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>yarn-default.xml</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>yarn-version-info.properties</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/native/linux32/libleveldbjni.so</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/native/linux64/libleveldbjni.so</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/native/osx/libleveldbjni.jnilib</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/native/windows32/leveldbjni.dll</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/native/windows64/leveldbjni.dll</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/com.fasterxml.jackson.core.JsonFactory</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.commons.logging.LogFactory</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.glassfish.hk2.extension.ServiceLocatorGenerator</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/javax.ws.rs.ext.RuntimeDelegate</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.glassfish.jersey.internal.spi.AutoDiscoverable</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.glassfish.jersey.internal.spi.ForcedAutoDiscoverable</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/javax.servlet.ServletContainerInitializer</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.glassfish.jersey.servlet.spi.AsyncContextDelegateProvider</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.glassfish.jersey.servlet.spi.FilterUrlMappingsProvider</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/native/libnetty_transport_native_epoll_x86_64.so</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/native/libnetty_transport_native_kqueue_x86_64.jnilib</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/com.fasterxml.jackson.core.ObjectCodec</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/com.fasterxml.jackson.databind.Module</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.spark.status.AppHistoryServerPlugin</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>scala-parser-combinators.properties</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>org.codehaus.commons.compiler.properties</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.orc.DataMask$Provider</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>parquet.thrift</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/sun.net.spi.nameservice.NameServiceDescriptor</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.eclipse.jetty.http.HttpFieldPreEncoder</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>com.espn.deltalake.DeltaLakeOverride</Main-Class>
<Build-Number>1.0</Build-Number>
</manifestEntries>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
<!-- <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-dependency-plugin</artifactId>
<executions> <execution> <id>copy-dependencies</id> <phase>prepare-package</phase>
<goals> <goal>copy-dependencies</goal> </goals> <configuration> <outputDirectory>${project.build.directory}/classes/lib</outputDirectory>
<overWriteReleases>false</overWriteReleases> <overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer> </configuration> </execution> </executions>
</plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId>
<configuration> <archive> <manifest> <addClasspath>true</addClasspath> <classpathPrefix>lib/</classpathPrefix>
<mainClass>com.espn.deltalake.DeltaLakeOverride</mainClass> </manifest> </archive>
</configuration> </plugin> -->
</plugins>
</build>
</project>
Let me know , if it works , I will make it shorter .
@dyadagiri your problem has solved .?
Sorry, this is probably a better question for a maven support list, as I am unfortunately not an expert in POM files.
If you can't figure out how to preserve the metadata in your uber jar, you can also use the source by specifying its fully qualified name (
org.apache.spark.sql.delta.sources.DeltaDataSource
).Note this is not technically a stable API and could change in the future.
Had the same issue in my sbt project. The full qualified name is quick workaround.
@dyadagiri have you solved your problem? I have the same issue. I mean java.lang.NoSuchMethodError: org.apache.spark.sql.SparkSession$.active()Lorg/apache/spark/sql/SparkSession
exception
@marmbrus @fangz5 The solution of providing fully qualified name of the source works when I want to read it using spark.read.format, but if we want to use DeltaTable.forPath(), internally it is reading through spark.read.format("delta"), so this blocks from using DeltaTable API. How can I resolve this? I see that META-INF/services is already getting included in my JAR, so as you said, it shouldn't be any problem, but it still gives the same error. Please help on urgent basis.
@marmbrus @fangz5 The solution of providing fully qualified name of the source works when I want to read it using spark.read.format, but if we want to use DeltaTable.forPath(), internally it is reading through spark.read.format("delta"), so this blocks from using DeltaTable API. How can I resolve this? I see that META-INF/services is already getting included in my JAR, so as you said, it shouldn't be any problem, but it still gives the same error. Please help on urgent basis.
I'm having the same problem. Have you found a solution?
@AnaMariaIlie Yes, I realized that to make it work correctly, we must have META-INF/services/org.apache.spark.sql.sources.DataSourceRegister included in the JAR. A brief background, typically this services folder is used for keeping runtime configurations. Now, if we have other data sources which reference similarly named file, may be EventHubs, etc., then we need to check the MergeStrategy(in SBT) or what above answers are referring as "Transformers" in Maven. So, we need to merge all such named files and include it in our JAR as a single file. The piece of code from my build.sbt, which I used to solve this.
assemblyMergeStrategy in assembly := {
case PathList("META-INF","services", xg @ _*) => MergeStrategy.concat
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
After this, when we open the JAR, we can see all the files with this name, merged into a single file with content concatenated. Something like this for me:
org.apache.spark.sql.eventhubs.EventHubsSourceProvider
org.apache.spark.sql.delta.sources.DeltaDataSource
org.apache.spark.sql.v2.avro.AvroDataSourceV2
org.apache.spark.sql.kafka010.KafkaSourceProvider
Now, you can use simply, spark.read.format("delta") or DeltaTable.forPath()
Hope this helps!
@AnaMariaIlie Yes, I realized that to make it work correctly, we must have META-INF/services/org.apache.spark.sql.sources.DataSourceRegister included in the JAR. A brief background, typically this services folder is used for keeping runtime configurations. Now, if we have other data sources which reference similarly named file, may be EventHubs, etc., then we need to check the MergeStrategy(in SBT) or what above answers are referring as "Transformers" in Maven. So, we need to merge all such named files and include it in our JAR as a single file. The piece of code from my build.sbt, which I used to solve this.
assemblyMergeStrategy in assembly := {
case PathList("META-INF","services", xg @ _*) => MergeStrategy.concat
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
After this, when we open the JAR, we can see all the files with this name, merged into a single file with content concatenated. Something like this for me:
org.apache.spark.sql.eventhubs.EventHubsSourceProvider
org.apache.spark.sql.delta.sources.DeltaDataSource
org.apache.spark.sql.v2.avro.AvroDataSourceV2
org.apache.spark.sql.kafka010.KafkaSourceProvider
Now, you can use simply,
spark.read.format("delta") or DeltaTable.forPath()
Hope this helps!
Thank you for the quick answer! Indeed, that is the problem, but I'm using maven and I still can't concatenate them :( It picks just KafkaSourceProvider
@AnaMariaIlie Check this out: https://mvnrepository.com/artifact/org.zcore.maven/merge-maven-plugin
@AnaMariaIlie Check this out: https://mvnrepository.com/artifact/org.zcore.maven/merge-maven-plugin
@Vishvin95 Thank you! My problem was that I was using both maven-shade and maven-assembly plugins. Removing maven-assembly plugin solved it!
I have ran into a similar issue when using gradle. Anyone know how to concat the meta-inf files when using gradle?
I tried this however it just caused another failure:
shadowJar { mergeServiceFiles() }
FYI - I solved this problem when using gradle:
shadowJar {
mergeServiceFiles("META-INF/services/org.apache.spark.sql.sources.DataSourceRegister")
}
getting this error while executing my FAT jar