Closed soumilshah1995 closed 10 months ago
Discussion you can find slack post https://apache-hudi.slack.com/archives/C4D716NPQ/p1702936427363959?thread_ts=1694714610.810039&cid=C4D716NPQ
@soumilshah1995 You are downloading the wrong hudi bundle jar. Glue 4 supports spark 3.3. So you should download the once here - https://mvnrepository.com/artifact/org.apache.hudi/hudi-spark3.3-bundle_2.12/0.14.0
I will try this out
I'm still encountering the same issue. I downloaded the JAR files and uploaded them to S3 as instructed in the ticket. I also utilized the Glue job mentioned earlier, but the error persists.
: org.apache.hudi.internal.schema.HoodieSchemaException: Failed to convert struct type to avro schema: StructType(StructField(emp_id,LongType,true), StructField(employee_name,StringType,true), StructField(department,StringType,true), StructField(state,StringType,true), StructField(salary,LongType,true), StructField(age,LongType,true), StructField(bonus,LongType,true), StructField(ts,LongType,true)) at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:147) at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:276) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apac
Can you paste entire stack trace what you getting now? Last time it was clearly saying that Spark 3_1 adapter class bot found as we were using spark3 jars which is for spark 3.1 version and glue had 3.3
Sure
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:173)
at org.apache.logging.log4j.util.LoaderUtil.newInstanceOf(LoaderUtil.java:211)
at org.apache.logging.log4j.util.LoaderUtil.newCheckedInstanceOf(LoaderUtil.java:232)
at org.apache.logging.log4j.core.util.Loader.newCheckedInstanceOf(Loader.java:301)
at org.apache.logging.log4j.core.lookup.Interpolator.<init>(Interpolator.java:95)
at org.apache.logging.log4j.core.config.AbstractConfiguration.<init>(AbstractConfiguration.java:114)
at org.apache.logging.log4j.core.config.DefaultConfiguration.<init>(DefaultConfiguration.java:55)
at org.apache.logging.log4j.core.layout.PatternLayout$Builder.build(PatternLayout.java:430)
at org.apache.logging.log4j.core.layout.PatternLayout.createDefaultLayout(PatternLayout.java:324)
at org.apache.logging.log4j.core.appender.ConsoleAppender$Builder.<init>(ConsoleAppender.java:121)
at org.apache.logging.log4j.core.appender.ConsoleAppender.newBuilder(ConsoleAppender.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.createBuilder(PluginBuilder.java:158)
at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:119)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:813)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:753)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:745)
at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:389)
at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:169)
at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:181)
at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:446)
at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:520)
at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:536)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:214)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
at org.apache.logging.log4j.LogManager.getLogger(LogManager.java:597)
at org.apache.spark.metrics.sink.MetricsConfigUtils.<clinit>(MetricsConfigUtils.java:12)
at org.apache.spark.metrics.sink.MetricsProxyInfo.fromConfig(MetricsProxyInfo.java:17)
at com.amazonaws.services.glue.cloudwatch.CloudWatchLogsAppenderCommon.<init>(CloudWatchLogsAppenderCommon.java:62)
at com.amazonaws.services.glue.cloudwatch.CloudWatchLogsAppenderCommon$CloudWatchLogsAppenderCommonBuilder.build(CloudWatchLogsAppenderCommon.java:79)
at com.amazonaws.services.glue.cloudwatch.CloudWatchAppender.activateOptions(CloudWatchAppender.java:73)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.slf4j.impl.Log4jLoggerFactory.<init>(Log4jLoggerFactory.java:66)
at org.slf4j.impl.StaticLoggerBinder.<init>(StaticLoggerBinder.java:72)
at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:45)
at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:412)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:357)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383)
at org.apache.spark.network.util.JavaUtils.<clinit>(JavaUtils.java:41)
at org.apache.spark.internal.config.ConfigHelpers$.byteFromString(ConfigBuilder.scala:67)
at org.apache.spark.internal.config.ConfigBuilder.$anonfun$bytesConf$1(ConfigBuilder.scala:259)
at org.apache.spark.internal.config.ConfigBuilder.$anonfun$bytesConf$1$adapted(ConfigBuilder.scala:259)
at org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
at org.apache.spark.internal.config.TypedConfigBuilder.createWithDefault(ConfigBuilder.scala:144)
at org.apache.spark.internal.config.package$.<init>(package.scala:345)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf.set(SparkConf.scala:94)
at org.apache.spark.SparkConf.$anonfun$loadFromSystemProperties$3(SparkConf.scala:76)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:788)
at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:230)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:461)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:461)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:787)
at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:75)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:70)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:59)
at com.amazonaws.services.glue.SparkProcessLauncherPlugin.getSparkConf(ProcessLauncher.scala:45)
at com.amazonaws.services.glue.SparkProcessLauncherPlugin.getSparkConf$(ProcessLauncher.scala:44)
at com.amazonaws.services.glue.ProcessLauncher$$anon$1.getSparkConf(ProcessLauncher.scala:200)
at com.amazonaws.services.glue.ProcessLauncher.<init>(ProcessLauncher.scala:209)
at com.amazonaws.services.glue.ProcessLauncher.<init>(ProcessLauncher.scala:201)
at com.amazonaws.services.glue.ProcessLauncher$.main(ProcessLauncher.scala:33)
at com.amazonaws.services.glue.ProcessLauncher.main(ProcessLauncher.scala)
2023-12-19 14:12:00,004 main INFO Log4j appears to be running in a Servlet environment, but there's no log4j-web module available. If you want better web container support, please add the log4j-web JAR to your web archive or server lib directory.
+------+------------------+----------+-----+------+---+-----+----------+
|emp_id| employee_name|department|state|salary|age|bonus| ts|
+------+------------------+----------+-----+------+---+-----+----------+
| 0| Shannon Frederick| HR| TX|105023| 40|49326| 458792232|
| 1| Kelly Christian| IT| IL| 60827| 34|30899|1438189473|
| 2| Nancy Thompson| Marketing| RJ|112698| 45|15104| 366953525|
| 3| Xavier Crane| Sales| RJ|146419| 32|95363| 902407760|
| 4| Amanda Castillo| HR| NY| 60163| 29|51515| 473773911|
| 5| Ashley Wright| IT| TX|141370| 32|44223| 615176188|
| 6|Stephanie Sandoval| Sales| CA|128303| 34|74014| 4721797|
| 7| Danielle Mccarthy| IT| CA| 22041| 49|36731|1405027762|
| 8| Hannah Pearson| Marketing| NY| 85896| 21|90073|1357045062|
| 9| Michael Lynch| HR| NY| 38922| 21| 7710|1466663385|
+------+------------------+----------+-----+------+---+-----+----------+
N
one
Upserting Hudi
Failed 1
An error occurred while calling o128.save.
: org.apache.hudi.internal.schema.HoodieSchemaException: Failed to convert struct type to avro schema: StructType(StructField(emp_id,LongType,true), StructField(employee_name,StringType,true), StructField(department,StringType,true), StructField(state,StringType,true), StructField(salary,LongType,true), StructField(age,LongType,true), StructField(bonus,LongType,true), StructField(ts,LongType,true))
at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:147)
at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:276)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apac
he.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark3_1Adapter
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.hudi.SparkAdapterSupport$.sparkAdapter$lzycompute(SparkAdapterSupport.scala:49)
at org.apache.hudi.SparkAdapterSupport$.sparkAdapter(SparkAdapterSupport.scala:35)
at org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:29)
at org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
at org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:66)
at org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:66)
at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:143)
... 41 more
Upserting Hudi
Failed 1
An error occurred while calling o141.save.
: org.apache.hudi.internal.schema.HoodieSchemaException: Failed to convert struct type to avro schema: StructType(StructField(emp_id,LongType,true), StructField(employee_name,StringType,true), StructField(department,StringType,true), StructField(state,StringType,true), StructField(salary,LongType,true), StructField(age,LongType,true), StructField(bonus,LongType,true), StructField(ts,LongType,true))
at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:147)
at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:276)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apac
he.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecu
tion.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at o
rg.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark3_1Adapter
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoa
der.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.hudi.SparkAdapterSupport$.sparkAdapter$lzycompute(SparkAdapterSupport.scala:49)
at org.apache.hudi.SparkAdapterSupport$.sparkAdapter(SparkAdapterSupport.scala:35)
at org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:29)
at org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
at org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:66)
at org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:66)
at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:143)
... 41 more
will be making video for community to use Hudi 0.14 on Glue :D cheers happy holiday
Thanks @soumilshah1995
Anytime @ad1happy2go
you guys are real champ just trying to learn more from you guys :D
guys any pointer on how to provide these jars when provisioning using CloudFormation Templates. CloudFormation template doesn't have any explicit option to provide "dependent jars" and not sure if "--extra-jars" parameter can be used for the same purpose. Also, Glue docs states that when providing jars for different Hudi version "--datalake-formats: hudi" should not be provided.(saying if it would be helpful)
I use server less that makes it very easy
@kamaagar I have been following up with AWS teams no one suggests to bring your own JAR. EMR 6+ versions have their own Hudi jar in place. So my suggestion is don't waste time in bringing any extra jar. Even I am facing same problem and checking how to fix it.
Hudi 0.14 on AWS Glue
Overview
This project aims to use Hudi 0.14 on AWS Glue, leveraging Glue 4. The default Glue setup supports Hudi but uses an older version. The objective is to use the specified Hudi version with Glue 4.
Prerequisites
Setup Instructions
Dependent JARs path:
s3://XXX/jar/hudi-spark3-bundle_2.12-0.14.0.jar,s3://XXX/jar/spark-avro_2.12-3.5.0.jar
Job parameters info:
--extra-jars
:s3://jXXX/jar/hudi-spark3-bundle_2.12-0.14.0.jar,s3://jt-dev-datateam/jar/spark-avro_2.12-3.5.0.jar
--additional-python-modules
:faker==11.3.0
Test code
Error