apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.4k stars 2.43k forks source link

[SUPPORT] Running compaction gives java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem #11884

Closed alberttwong closed 2 months ago

alberttwong commented 2 months ago

Running compaction gives java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem

To Reproduce

Steps to reproduce the behavior:

hudi:stock_ticks_mor->compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1
36171 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
36266 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkMain: Invoking SparkMain: COMPACT_SCHEDULE
36288 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkContext: Running Spark version 3.4.3
36303 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO ResourceUtils: ==============================================================
36303 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO ResourceUtils: No custom resources configured for spark.driver.
36303 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO ResourceUtils: ==============================================================
36304 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkContext: Submitted application: hoodie-cli-COMPACT_SCHEDULE
36318 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
36325 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO ResourceProfile: Limiting resource is cpu
36325 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO ResourceProfileManager: Added ResourceProfile id: 0
36362 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SecurityManager: Changing view acls to: root
36362 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SecurityManager: Changing modify acls to: root
36362 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SecurityManager: Changing view acls groups to: 
36362 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SecurityManager: Changing modify acls groups to: 
36362 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: root; groups with view permissions: EMPTY; users with modify permissions: root; groups with modify permissions: EMPTY
36375 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
36375 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
36375 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
36492 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO Utils: Successfully started service 'sparkDriver' on port 32843.
36515 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkEnv: Registering MapOutputTracker
36569 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkEnv: Registering BlockManagerMaster
36580 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
36580 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
36583 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
36597 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-0a3993c6-3100-45e5-8f10-dd136154d7e3
36607 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
36616 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkEnv: Registering OutputCommitCoordinator
36702 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
36738 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO Utils: Successfully started service 'SparkUI' on port 4040.
36756 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkContext: Added JAR file:///opt/hudicli/hudi-spark-bundle_2.12-0.15.0.jar at spark://openjdk8:32843/jars/hudi-spark-bundle_2.12-0.15.0.jar with timestamp 1725499909453
36756 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO SparkContext: Added JAR file:/opt/hudicli/hudi-cli-bundle_2.12-0.15.0.jar at spark://openjdk8:32843/jars/hudi-cli-bundle_2.12-0.15.0.jar with timestamp 1725499909453
36797 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO Executor: Starting executor ID driver on host openjdk8
36800 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
36807 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:49 INFO Executor: Fetching spark://openjdk8:32843/jars/hudi-cli-bundle_2.12-0.15.0.jar with timestamp 1725499909453
36833 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO TransportClientFactory: Successfully created connection to openjdk8/172.18.0.12:32843 after 15 ms (0 ms spent in bootstraps)
36837 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO Utils: Fetching spark://openjdk8:32843/jars/hudi-cli-bundle_2.12-0.15.0.jar to /tmp/spark-547dfa1f-a0b1-49b3-a4fc-3b2883274b0f/userFiles-33c2bfff-3d41-4869-986c-4d83842518a6/fetchFileTemp4244620261968792226.tmp
36919 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO Executor: Adding file:/tmp/spark-547dfa1f-a0b1-49b3-a4fc-3b2883274b0f/userFiles-33c2bfff-3d41-4869-986c-4d83842518a6/hudi-cli-bundle_2.12-0.15.0.jar to class loader
36919 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO Executor: Fetching spark://openjdk8:32843/jars/hudi-spark-bundle_2.12-0.15.0.jar with timestamp 1725499909453
36919 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO Utils: Fetching spark://openjdk8:32843/jars/hudi-spark-bundle_2.12-0.15.0.jar to /tmp/spark-547dfa1f-a0b1-49b3-a4fc-3b2883274b0f/userFiles-33c2bfff-3d41-4869-986c-4d83842518a6/fetchFileTemp3511139841596004747.tmp
37040 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO Executor: Adding file:/tmp/spark-547dfa1f-a0b1-49b3-a4fc-3b2883274b0f/userFiles-33c2bfff-3d41-4869-986c-4d83842518a6/hudi-spark-bundle_2.12-0.15.0.jar to class loader
37049 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44907.
37049 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO NettyBlockTransferService: Server created on openjdk8:44907
37051 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
37057 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, openjdk8, 44907, None)
37060 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO BlockManagerMasterEndpoint: Registering block manager openjdk8:44907 with 366.3 MiB RAM, BlockManagerId(driver, openjdk8, 44907, None)
37061 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, openjdk8, 44907, None)
37062 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, openjdk8, 44907, None)
37190 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_endpoint
37190 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
37190 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider
37190 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
37199 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
37201 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_endpoint
37205 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
37205 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider
37205 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
37228 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_endpoint
37228 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
37228 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider
37228 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 ERROR SparkMain: Fail to execute commandString
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - org.apache.hudi.exception.HoodieException: Unable to create org.apache.hudi.storage.hadoop.HoodieHadoopStorage
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.storage.HoodieStorageUtils.getStorage(HoodieStorageUtils.java:44)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.common.table.HoodieTableMetaClient.getStorage(HoodieTableMetaClient.java:309)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:81)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:770)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.utilities.UtilHelpers.createMetaClient(UtilHelpers.java:603)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.utilities.HoodieCompactor.<init>(HoodieCompactor.java:71)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.cli.commands.SparkMain.compact(SparkMain.java:366)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.cli.commands.SparkMain.main(SparkMain.java:176)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at java.lang.reflect.Method.invoke(Method.java:498)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.storage.hadoop.HoodieHadoopStorage
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:75)
37230 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.storage.HoodieStorageUtils.getStorage(HoodieStorageUtils.java:41)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       ... 19 more
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - Caused by: java.lang.reflect.InvocationTargetException
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:73)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       ... 20 more
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.hadoop.fs.HadoopFSUtils.getFs(HadoopFSUtils.java:116)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.hadoop.fs.HadoopFSUtils.getFs(HadoopFSUtils.java:109)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.storage.hadoop.HoodieHadoopStorage.<init>(HoodieHadoopStorage.java:63)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       ... 25 more
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       ... 35 more
37231 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO SparkContext: SparkContext is stopping with exitCode 0.
37237 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO SparkUI: Stopped Spark web UI at http://openjdk8:4040
37246 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
37251 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO MemoryStore: MemoryStore cleared
37251 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO BlockManager: BlockManager stopped
37255 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO BlockManagerMaster: BlockManagerMaster stopped
37257 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
37266 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO SparkContext: Successfully stopped SparkContext
37267 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO ShutdownHookManager: Shutdown hook called
37268 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO ShutdownHookManager: Deleting directory /tmp/spark-547dfa1f-a0b1-49b3-a4fc-3b2883274b0f
37269 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 01:31:50 INFO ShutdownHookManager: Deleting directory /tmp/spark-844f1a50-3bb3-4b59-9064-85ccb3ab7fa3
Failed to run compaction for 20240905013148418

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

alberttwong commented 2 months ago

need to add the two hadoop related files into /spark/jars

root@openjdk8:/spark/jars# cp /opt/hudisync/aws-java-sdk-bundle-1.11.271.jar .
root@openjdk8:/spark/jars# cp /opt/hudisync/hadoop-aws-2.10.2.jar .