apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.98k stars 1.8k forks source link

[Document] Remove a `fs.oss.credentials.provider` option #7507

Closed loustler closed 2 months ago

loustler commented 2 months ago

Purpose of this pull request

Remove a fs.oss.credentials.provider configuration from checkpoint with OSS storage. Because if it provided into a hadoop configuration, it trying to find a constructor has of URL and Configuration, and it throws a exception. Because a AliyunCredentialsProvider class only have a single constructor which requires a Configuration class. It leads users like me to be confused/misunderstood, so we need to fix documents about it for users.

These are all release hadoop-aliyun source codes about load a credential provider Hadoop 3.1.4

https://github.com/apache/hadoop/blob/1e877761e8dadd71effef30e592368f7fe66a61b/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSUtils.java#L105-L141

https://github.com/apache/hadoop/blob/1e877761e8dadd71effef30e592368f7fe66a61b/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunCredentialsProvider.java#L35-L67

Hadoop 3.3.6

https://github.com/apache/hadoop/blob/1be78238728da9266a4f88195058f08fd012bf9c/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSUtils.java#L106-L143

https://github.com/apache/hadoop/blob/1be78238728da9266a4f88195058f08fd012bf9c/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunCredentialsProvider.java#L35-L67

Error stacktrace

2024-08-27 16:43:55,635 WARN  [Log4j2HttpPostCommandProcessor] [hz.main.cached.thread-3] - [192.168.106.239]:5801 [my-seatunnel] [5.1] An error occurred while handling request HttpCommand [HTTP_POST]{uri='/hazelcast/rest/maps/submit-job?jobName=my-job'}AbstractTextCommand[HTTP_POST]{requestId=0}
java.util.concurrent.CompletionException: org.apache.seatunnel.engine.common.exception.JobException: org.apache.seatunnel.engine.checkpoint.storage.exception.CheckpointStorageException: Failed to get file system
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.initStorage(HdfsStorage.java:70)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.<init>(HdfsStorage.java:57)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.common.HdfsFileStorageInstance.getOrCreateStorage(HdfsFileStorageInstance.java:53)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorageFactory.create(HdfsStorageFactory.java:75)
    at org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.<init>(CheckpointManager.java:105)
    at org.apache.seatunnel.engine.server.master.JobMaster.initCheckPointManager(JobMaster.java:288)
    at org.apache.seatunnel.engine.server.master.JobMaster.init(JobMaster.java:271)
    at org.apache.seatunnel.engine.server.CoordinatorService.lambda$submitJob$4(CoordinatorService.java:499)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider constructor exception.  A class specified in fs.oss.credentials.provider must provide an accessible constructor accepting URI and Configuration, or an accessible default constructor.
    at org.apache.hadoop.fs.aliyun.oss.AliyunOSSUtils.getCredentialsProvider(AliyunOSSUtils.java:132)
    at org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.initialize(AliyunOSSFileSystemStore.java:155)
    at org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.initialize(AliyunOSSFileSystem.java:349)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3611)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:554)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:290)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.initStorage(HdfsStorage.java:68)
    ... 12 more
Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider.<init>()
    at java.lang.Class.getConstructor0(Class.java:3082)
    at java.lang.Class.getDeclaredConstructor(Class.java:2178)
    at org.apache.hadoop.fs.aliyun.oss.AliyunOSSUtils.getCredentialsProvider(AliyunOSSUtils.java:126)
    ... 18 more

    at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375) ~[?:1.8.0_392]
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947) ~[?:1.8.0_392]
    at org.apache.seatunnel.engine.server.rest.RestHttpPostCommandProcessor.submitJob(RestHttpPostCommandProcessor.java:239) ~[seatunnel-starter.jar:2.3.7]
    at org.apache.seatunnel.engine.server.rest.RestHttpPostCommandProcessor.handleSubmitJob(RestHttpPostCommandProcessor.java:146) ~[seatunnel-starter.jar:2.3.7]
    at org.apache.seatunnel.engine.server.rest.RestHttpPostCommandProcessor.handle(RestHttpPostCommandProcessor.java:82) ~[seatunnel-starter.jar:2.3.7]
    at org.apache.seatunnel.engine.server.rest.RestHttpPostCommandProcessor.handle(RestHttpPostCommandProcessor.java:60) ~[seatunnel-starter.jar:2.3.7]
    at com.hazelcast.internal.ascii.TextCommandServiceImpl$CommandExecutor.run(TextCommandServiceImpl.java:402) ~[seatunnel-starter.jar:2.3.7]
    at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217) ~[seatunnel-starter.jar:2.3.7]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_392]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_392]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_392]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[seatunnel-starter.jar:2.3.7]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) ~[seatunnel-starter.jar:2.3.7]
Caused by: org.apache.seatunnel.engine.common.exception.JobException: org.apache.seatunnel.engine.checkpoint.storage.exception.CheckpointStorageException: Failed to get file system
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.initStorage(HdfsStorage.java:70)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.<init>(HdfsStorage.java:57)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.common.HdfsFileStorageInstance.getOrCreateStorage(HdfsFileStorageInstance.java:53)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorageFactory.create(HdfsStorageFactory.java:75)
    at org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.<init>(CheckpointManager.java:105)
    at org.apache.seatunnel.engine.server.master.JobMaster.initCheckPointManager(JobMaster.java:288)
    at org.apache.seatunnel.engine.server.master.JobMaster.init(JobMaster.java:271)
    at org.apache.seatunnel.engine.server.CoordinatorService.lambda$submitJob$4(CoordinatorService.java:499)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider constructor exception.  A class specified in fs.oss.credentials.provider must provide an accessible constructor accepting URI and Configuration, or an accessible default constructor.
    at org.apache.hadoop.fs.aliyun.oss.AliyunOSSUtils.getCredentialsProvider(AliyunOSSUtils.java:132)
    at org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.initialize(AliyunOSSFileSystemStore.java:155)
    at org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.initialize(AliyunOSSFileSystem.java:349)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3611)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:554)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:290)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.initStorage(HdfsStorage.java:68)
    ... 12 more
Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider.<init>()
    at java.lang.Class.getConstructor0(Class.java:3082)
    at java.lang.Class.getDeclaredConstructor(Class.java:2178)
    at org.apache.hadoop.fs.aliyun.oss.AliyunOSSUtils.getCredentialsProvider(AliyunOSSUtils.java:126)
    ... 18 more

    at org.apache.seatunnel.engine.server.CoordinatorService.lambda$submitJob$4(CoordinatorService.java:506) ~[seatunnel-starter.jar:2.3.7]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_392]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_392]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_392]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_392]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_392]

Does this PR introduce any user-facing change?

YES

seatunnel:
  engine:
    checkpoint:
      interval: 6000
      timeout: 7000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          storage.type: oss
          oss.bucket: your-bucket
          fs.oss.accessKeyId: your-access-key
          fs.oss.accessKeySecret: your-secret-key
          fs.oss.endpoint: endpoint address
-         fs.oss.credentials.provider: org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider

How was this patch tested?

It tested with release 2.3.7 with OSS

Check list

Carl-Zhou-CN commented 2 months ago

Yes. We did clean in #7332 . But seem like not cleaned up. Thanks @loustler ! cc @Carl-Zhou-CN

@loustler Could you please help me globally search for 'org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider' and completely remove this configuration? Thank you very much for your help.

loustler commented 2 months ago

@Carl-Zhou-CN A fs.oss.credentials.provider found in some tests image

A org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider found in some documents and tests image

I found those by IntelliJ.

Should I change this? https://github.com/apache/seatunnel/blob/c6f627fa38d29f83bccb3e1fb86e8962b64dad4e/docs/en/seatunnel-engine/checkpoint-storage.md?plain=1#L51

https://github.com/apache/seatunnel/blob/c6f627fa38d29f83bccb3e1fb86e8962b64dad4e/docs/zh/seatunnel-engine/checkpoint-storage.md?plain=1#L49

Carl-Zhou-CN commented 2 months ago

I think it can be removed from the test case, @Hisoka-X what do you think?

Hisoka-X commented 2 months ago

I think it can be removed from the test case, @Hisoka-X what do you think?

yes.

loustler commented 2 months ago

@Carl-Zhou-CN @Hisoka-X Removed it from test codes