apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.36k stars 2.42k forks source link

[SUPPORT] compaction org.apache.hudi.exception.HoodieLockException: Unsupported scheme :s3a #11886

Closed alberttwong closed 1 month ago

alberttwong commented 1 month ago

Running compaction on a s3a bucket gives an HoodieLockException

To Reproduce

Steps to reproduce the behavior:

get this error by running compaction

compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1
22041 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO LockManager: LockProvider org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider
22041 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_endpoint
22041 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_FS_ATOMIC_CREATION_SUPPORT
22041 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
22042 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider
22042 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
22042 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_HOODIE_FS_ATOMIC_CREATION_SUPPORT
22043 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO TransactionManager: Transaction ending with transaction owner Option{val=[==>20240905020211347__compaction__REQUESTED]}
22043 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO LockManager: LockProvider org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider
22043 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_endpoint
22043 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_FS_ATOMIC_CREATION_SUPPORT
22043 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
22043 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider
22043 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
22043 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_HOODIE_FS_ATOMIC_CREATION_SUPPORT
22044 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO BaseHoodieClient: Stopping Timeline service !!
22044 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO EmbeddedTimelineService: Closing Timeline server
22044 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO TimelineService: Closing Timeline Service
22044 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO Javalin: Stopping Javalin ...
22051 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO Javalin: Javalin has stopped
22051 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO TimelineService: Closed Timeline Service
22051 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO EmbeddedTimelineService: Closed Timeline server
22052 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO TransactionManager: Transaction manager closed
22053 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 INFO TransactionManager: Transaction manager closed
22054 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 02:02:13 ERROR UtilHelpers: Compact failed
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:75)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:125)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:112)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:70)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableService(BaseHoodieTableServiceClient.java:609)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService(BaseHoodieWriteClient.java:1216)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant(BaseHoodieWriteClient.java:968)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.utilities.HoodieCompactor.doSchedule(HoodieCompactor.java:287)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.utilities.HoodieCompactor.lambda$compact$0(HoodieCompactor.java:196)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.utilities.UtilHelpers.retry(UtilHelpers.java:621)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.utilities.HoodieCompactor.compact(HoodieCompactor.java:192)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.cli.commands.SparkMain.compact(SparkMain.java:366)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.cli.commands.SparkMain.main(SparkMain.java:176)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
22055 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at java.lang.reflect.Method.invoke(Method.java:498)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - Caused by: java.lang.reflect.InvocationTargetException
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:73)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       ... 24 more
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - Caused by: org.apache.hudi.exception.HoodieLockException: Unsupported scheme :s3a, since this fs can not support atomic creation
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       at org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.<init>(FileSystemBasedLockProvider.java:90)
22056 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -       ... 29 more

Expected behavior

There shouldn't be any error

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

alberttwong commented 1 month ago

This doesn't work. You can see from the logs that it picked up the setting.

export HOODIE_ENV_FS_ATOMIC_CREATION_SUPPORT=s3a
export HOODIE_ENV_HOODIE_FS_ATOMIC_CREATION_SUPPORT=s3a
alberttwong commented 1 month ago

related. https://github.com/apache/hudi/issues/11036

alberttwong commented 1 month ago

Doesn't work either

export HOODIE_ENV_hoodie_DOT_fs_DOT_atomic_creation_DOT_support=s3a
majian1998 commented 1 month ago

It seems that your settings are not working? If you don't need to use locks, you can try setting hoodie.write.lock.provider to org.apache.hudi.client.transaction.lock.InProcessLockProvider.

alberttwong commented 1 month ago

I'm not sure if I should change https://hudi.apache.org/docs/configurations/#hoodiewritelockprovider since it's org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider as the default. Anyhow I tried.

export HOODIE_ENV_hoodie_DOT_write_DOT_lock_DOT_provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider
21748 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO TransactionManager: Transaction starting for Option{val=[==>20240905041525278__compaction__REQUESTED]} with latest completed transaction instant Optional.empty
21748 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO LockManager: LockProvider org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider
21749 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_hoodie_DOT_write_DOT_lock_DOT_provider
21749 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_endpoint
21749 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_FS_ATOMIC_CREATION_SUPPORT
21749 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
21749 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider
21749 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
21750 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_hoodie_DOT_fs_DOT_atomic_creation_DOT_support
21750 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_HOODIE_FS_ATOMIC_CREATION_SUPPORT
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO TransactionManager: Transaction ending with transaction owner Option{val=[==>20240905041525278__compaction__REQUESTED]}
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO LockManager: LockProvider org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_hoodie_DOT_write_DOT_lock_DOT_provider
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_endpoint
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_FS_ATOMIC_CREATION_SUPPORT
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_hoodie_DOT_fs_DOT_atomic_creation_DOT_support
21751 [Thread-5] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 04:15:27 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_HOODIE_FS_ATOMIC_CREATION_SUPPORT

logs show it don'est pick up... I don't know why.

alberttwong commented 1 month ago

work around

export HUDI_CONF_DIR=/opt/hudi/packaging/hudi-cli-bundle/conf/

added to hudi-defaults.conf

hoodie.fs.atomic_creation.support                s3a
alberttwong commented 1 month ago

possible other issue is that you run this on jdk 11 and not jdk 8.