Open sharmamayank94 opened 1 week ago
@sharmamayank94 Your spark version doesn't match with hudi spark version. Check Quickstart - https://hudi.apache.org/docs/quick-start-guide/
--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.14.1 or --packages org.apache.hudi:hudi-spark3.5-bundle_2.12:0.14.1 accroding to your spark version
Thanks Aditya for reverting !
Used package org.apache.hudi:hudi-spark3.5-bundle_2.12
with 0.14.1 and got the following error now :-
org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: hoodie-parquet. Please find packages at `https://spark.apache.org/third-party-projects.html`.
at org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:725)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:647)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:100)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:99)
at org.apache.spark.sql.execution.datasources.DataSource.providingInstance(DataSource.scala:113)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
at org.apache.hudi.BaseFileOnlyRelation.toHadoopFsRelation(BaseFileOnlyRelation.scala:206)
at org.apache.hudi.DefaultSource$.resolveBaseFileOnlyRelation(DefaultSource.scala:333)
at org.apache.hudi.DefaultSource$.createRelation(DefaultSource.scala:264)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:117)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:73)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:229)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:211)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:186)
at com.amazon.insights.reader.HudiReader.$anonfun$readFull$1(HudiReader.scala:114)
at scala.util.Try$.apply(Try.scala:213)
at com.amazon.insights.reader.HudiReader.readFull(HudiReader.scala:114)
at com.amazon.insights.reader.HudiReader.readHistorical(HudiReader.scala:103)
at com.amazon.insights.reader.HudiReader.readDataFrame(HudiReader.scala:38)
at com.amazon.insights.spark.MyAggregationSparkJob.$anonfun$run$1(MyAggregationSparkJob.scala:57)
at com.amazon.insights.spark.MyAggregationSparkJob.$anonfun$run$1$adapted(MyAggregationSparkJob.scala:44)
at scala.collection.immutable.List.foreach(List.scala:431)
at com.amazon.insights.spark.MyAggregationSparkJob.run(MyAggregationSparkJob.scala:44)
at com.amazon.insights.DataAggregationApplication$.main(DataAggregationApplication.scala:30)
at com.amazon.insights.DataAggregationApplication.main(DataAggregationApplication.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)
Caused by: java.lang.ClassNotFoundException: hoodie-parquet.DefaultSource
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:633)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:633)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:633)
... 31 more
@sharmamayank94 Hudi 0.14.X doesn't support spark 3.5. So use newly released Hudi 0.15.0 or use with spark 3.3 or 3.4
@ad1happy2go attempted Hudi 0.14.x with spark 3.4 as well, still received the same issue of Failed to find the data source: hoodie-parquet.
Can you share the command you are using to submit jobs and entire stack trace now? I hope you are no longer getting java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark2Adapter
If you getting that Then your Hudi Version used are not correct.
@sharmamayank94 Were you able to get it resolved or still facing the issue. Please let us know.
@sharmamayank94 Which java version you are using. We see this issue in clustering when using Java 11
Describe the problem you faced
Migrated to Hudi 0.14.0 from Hudi version 0.8. While attempting to read Hudi table in parquet format from AWS S3 using Spark receiving the following error:-
org.apache.hudi.internal.schema.HoodieSchemaException: Failed to convert avro schema to struct type:
The underlying cause looks like classNotFound which is following:-
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark2Adapter
Note: The version of Hudi which wrote the table is also 0.14.0.
Attached stacktrace at the bottom.
Expected behavior
A clear and concise description of what you expected to happen.
Expected the spark to read the Hudi table and convert it to required Schema/Dataframe.
Environment Description
Hudi version : 0.14.0
Spark version : Tried with both 3.5 and 3.3
Hive version : 3.1.3
Hadoop version : 3.0.0
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No
EMR version: 7.0.0
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.