apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.61k stars 1.68k forks source link

利用seatunnel 2.3.5 同步aws s3 parquet文件遇到类型不支持问题 #6923

Closed xingfeng7788 closed 4 days ago

xingfeng7788 commented 1 month ago

Search before asking

What happened

利用seatunnel 2.3.5 同步aws s3 parquet文件遇到类型不支持问题 帮忙看看是什么原因

SeaTunnel Version

2.3.5

SeaTunnel Config

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  S3File {
    path = "/data/result/gdt/gdt_dim_date/"
    fs.s3a.endpoint="s3.cn-north-1.amazonaws.com.cn"
    fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
    have_partition = true
    partition_by = ["cal_year_month"]
    partition_dir_expression = "${k1}=${v1}"
    access_key = "****"
    secret_key = "****"
    bucket = "s3a://**"
    file_format_type = "parquet"
    hadoop_s3_properties {
      "fs.s3a.buffer.dir" = "/root/seatunnel/server/temp"
      "fs.s3a.fast.upload.buffer" = "disk"
    }
  }
}

transform {
  # If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
    # please go to https://seatunnel.apache.org/docs/category/transform-v2
}

sink {
  Console {}
}

Running Command

bin/seatunnel.sh --config profile/s3_test.conf

Error Exception

Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
    at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
    at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: ErrorCode:[COMMON-20], ErrorDescription:['Parquet' table 'default.default.default' unsupported get catalog table with field data types '{"dim_date_id":"optional int32 dim_date_id (INTEGER(32,true))","cal_year":"optional int32 cal_year (INTEGER(32,true))","cal_half":"optional int32 cal_half (INTEGER(32,true))","cal_quarter":"optional int32 cal_quarter (INTEGER(32,true))","cal_month":"optional int32 cal_month (INTEGER(32,true))","cal_tenday_month":"optional int32 cal_tenday_month (INTEGER(32,true))","cal_week":"optional int32 cal_week (INTEGER(32,true))","cal_week_year":"optional int32 cal_week_year (INTEGER(32,true))","cal_week_month":"optional int32 cal_week_month (INTEGER(32,true))","cal_week_seq":"optional int32 cal_week_seq (INTEGER(32,true))","fscl_year":"optional int32 fscl_year (INTEGER(32,true))","fscl_half":"optional int32 fscl_half (INTEGER(32,true))","fscl_quarter":"optional int32 fscl_quarter (INTEGER(32,true))","fscl_month":"optional int32 fscl_month (INTEGER(32,true))","cal_day_id":"optional int32 cal_day_id (INTEGER(32,true))","cal_month_days":"optional int32 cal_month_days (INTEGER(32,true))"}']
    at org.apache.seatunnel.common.exception.CommonError.getCatalogTableWithUnsupportedType(CommonError.java:151)
    at org.apache.seatunnel.connectors.seatunnel.file.source.reader.ReadStrategy.buildColumnsWithErrorCheck(ReadStrategy.java:86)
    at org.apache.seatunnel.connectors.seatunnel.file.source.reader.ParquetReadStrategy.getSeaTunnelRowTypeInfo(ParquetReadStrategy.java:272)
    at org.apache.seatunnel.connectors.seatunnel.file.source.reader.ParquetReadStrategy.getSeaTunnelRowTypeInfo(ParquetReadStrategy.java:241)
    at org.apache.seatunnel.connectors.seatunnel.file.s3.source.S3FileSource.prepare(S3FileSource.java:118)
    at org.apache.seatunnel.engine.core.parse.JobConfigParser.parseSource(JobConfigParser.java:81)
    at org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser.parseSource(MultipleTableJobConfigParser.java:327)
    at org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser.parse(MultipleTableJobConfigParser.java:188)
    at org.apache.seatunnel.engine.client.job.ClientJobExecutionEnvironment.getLogicalDag(ClientJobExecutionEnvironment.java:88)
    at org.apache.seatunnel.engine.client.job.ClientJobExecutionEnvironment.execute(ClientJobExecutionEnvironment.java:156)
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:149)
    ... 2 more

Zeta or Flink or Spark Version

Zeta

Java or Scala Version

jdk 1.8

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 4 days ago

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.