apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.08k stars 1.83k forks source link

Support for IBM Cloud COS (Cloud Object Storage) connector in Seatunnel Zeta #5899

Closed neehar18 closed 1 year ago

neehar18 commented 1 year ago

Search before asking

Description

I am looking for an apache seatunnel connector for IBM Cloud COS to access a bucket (format: cos://<bucket_name> ) I created. I couldn't find any in the latest apache seatunnel 2.3.3 version. I noticed something called seatunnel.source.CosFile = connector-file-cos seatunnel.sink.CosFile = connector-file-cos in the supported plugins, but it seems to be specific to Tencent Cloud COS and the bucket format there is cosn://<bucket_name> .

Given IBM Cloud COS is S3 compatible, I was wondering if there is anyway to leverage the existing S3File connector to support IBM Cloud COS in the current 2.3.3 version. If not, I wanted to check if there are any plans to support IBM Cloud COS in the future versions?

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

Carl-Zhou-CN commented 1 year ago

hi, @raheen1 I think the existing S3 File Source Connector should be able to support IBM Cloud COS. Could you please help test and give feedback?

neehar18 commented 1 year ago

Got it working with the S3File source connector, thanks

Carl-Zhou-CN commented 1 year ago

Got it working with the S3File source connector, thanks

good, can you help to share the configuration or contribute documentation?

neehar18 commented 1 year ago

Sure, here is the example config file I used for IBM COS to Console:

# Defining the runtime environment
env {
  # You can set flink configuration here
  execution.parallelism = 1
  job.mode = "BATCH"
}

source {
  S3File {
    path = "/sample.json"
    fs.s3a.endpoint="s3.us-west.cloud-object-storage.test.appdomain.cloud"
    fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
    access_key = "######"
    secret_key = "######"
    bucket = "s3a://apache-seatunnel-test-connector"
    file_format_type = "json"
    schema {
      fields {
        id = int 
        name = string
      }
    }
  }
}

transform {
  # If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
    # please go to https://seatunnel.apache.org/docs/category/transform-v2
}

sink {
  Console {}
}
Carl-Zhou-CN commented 1 year ago

Thank you very much, I will close this issue