apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.05k stars 1.83k forks source link

[Bug] [SftpFile] When file_fFormat type is Excel, automatic type conversion is not possible #8099

Open HT-cyber opened 17 hours ago

HT-cyber commented 17 hours ago

Search before asking

What happened

When file_fFormat type is Excel, automatic type conversion is not possible. For example, when a column in Excel contains both int and string values, it always reports a type conversion error.

SeaTunnel Version

2.5.8

SeaTunnel Config

# Set the basic configuration of the task to be performed
env {
  parallelism = 1
  job.mode = "BATCH"
}

# Create a source to connect to sftp
source {
  SftpFile {
    host = ""
    port = 22
    user = "user"
    password = "password"
    path = "/test"
    file_format_type = "excel"
    skip_header_row_number = 1
     schema {
         fields {
             code = string
             data = string
             success = string
         }
     }
  }
}

Running Command

./seatunnel-2.3.8/bin/seatunnel.sh -c ./test/sftp_test.config

Error Exception

Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
        at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:213)
        at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
        at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: ErrorCode:[COMMON-01], ErrorDescription:[SeaTunnel read file 'sftp://seatunnel_test/t1/sftp_test_1.xlsx' failed.]
        at org.apache.seatunnel.common.exception.CommonError.fileOperationFailed(CommonError.java:68)
        at org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:65)
        at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:159)
        at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:127)
        at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
        at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:132)
        at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693)
        at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018)
        at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:39)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassCastException

        at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:205)
        ... 2 more
2024-11-21 18:16:29,938 INFO  [s.c.s.s.c.ClientExecuteCommand] [ForkJoinPool.commonPool-worker-3] - run shutdown hook because get close signal

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

HT-cyber commented 2 hours ago

And there should be no empty areas in the Excel sheet, otherwise it will throw an exception, but it is normal for the sheet to have empty areas. I hope at least one configuration parameter can be provided. image image