Open sinban04 opened 8 months ago
I tried spark3.2.4 and delta2.0.2, (https://docs.delta.io/latest/releases.html) but unfortunately, it returns same error
Exception in thread "main" java.lang.ClassNotFoundException:
Failed to find data source: delta. Please find packages at
http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:443)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:670)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at io.delta.implicits.package$DeltaDataFrameWriter$.delta$extension(package.scala:59)
at com.naver.airspace.recsysops.App$.refine(App.scala:65)
at com.naver.airspace.recsysops.App$.main(App.scala:112)
at com.naver.airspace.recsysops.App.main(App.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:966)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:191)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:214)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1054)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1063)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: delta.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:656)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:656)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:656)
... 20 more
When i tried to do w/ spark-shell (https://docs.delta.io/latest/quick-start.html#spark-scala-shell) I works fine with spark 3.4.1
spark-shell --packages io.delta:delta-core_2.12:2.4.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
As this spark shell works, Pyspark 3.4.1 worked fined Then why is there an error w/ scala spark ?
but, when i tried spark 3.1.2-2
bin/spark-shell --packages io.delta:delta-core_2.12:1.0.1 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
(It seems that the cause of this error is kinda my system's internal issue. It works fine with download spark version.)
I spent quite a lot on this, and i figured out some facts. When i tried with spark-sql on several spark versions, after delta version 1.2.0, we need to import only delta-core, but also delta-storage before 1.2.0, delta-core is enough (I could run pyspark without error with those dependencies)
I succeeded to run scala spark giving delta library dependency w/ jars
option as with pyspark.
It succeeded to read delta file well.
However, It still fails on Maven dependency injection even w/ delta-core and delta-storage dependencies.
On spark 3.2.4
<!-- https://mvnrepository.com/artifact/io.delta/delta-core -->
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>2.0.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.delta/delta-storage -->
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-storage</artifactId>
<version>2.0.2</version>
</dependency>
For the scala spark (so far as i know) As we expected, It seems obvious that it's another dependency problem like before (https://github.com/delta-io/delta/issues/224)
I tried this issue on Sbt and build w/ aseembly plugins, and still it shows the exact same error as w/ Maven
I added dependency
"io.delta" % "delta-core_2.12" % "1.0.1",
and still it shows
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: delta. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:692)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:746)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:265)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
at com.naver.airspace.recsysops.Main$.runSpark(Main.scala:85)
at com.naver.airspace.recsysops.Main$.main(Main.scala:141)
at com.naver.airspace.recsysops.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: delta.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:666)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:666)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
... 18 more
Are you sure,
"io.delta" % "delta-core_2.12" % "1.0.1"
, this includes the proper dependencies ?
Issue Description
Hello, I'm trying to use
delta
format on Spark3.1.2-2 w/ scala. I followed the guide QuickStart and found compatible delta version w/ this page I used this maven repo and used Delta version1.0.1
for spark 3.1.2-2I built w/ delta dependencies and added configuration during spark submit
Command & Configs
and in the source
https://rmoff.net/2023/04/05/using-delta-from-pyspark-java.lang.classnotfoundexception-delta.defaultsource/ I checked my SparkSession contained all the delta configs
Logs
But i got an error w/ such log
Related Issue
I checked the issues on this repo, and most of them are using pyspark not the same case as me. https://github.com/delta-io/delta/issues/1013
Besides, I have all the class files on my Über jar (https://github.com/delta-io/delta/issues/700, https://github.com/delta-io/delta/issues/224) Not only
META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
, but also all the io.delta classes (I usedassembly
plugin, therefore i've never been through dependency problem so far)It seems it contains
DataSourceRegister
but, still can't find the source (https://github.com/delta-io/delta/issues/947)Could you help me out on this issue ? What am i missing ?