apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.8k stars 1.75k forks source link

[Bug] [doris-source-connectors] [2.3.5] DorisConnectorException with datetime field in doris source #7405

Open Darkzoneleet opened 4 weeks ago

Darkzoneleet commented 4 weeks ago

Search before asking

What happened

Seems like if there's any datetime field in doris source table, it'll crash, and fine when not. Table structure (same as test_doris_test):

CREATE TABLE `doris_test` (
`id` LARGEINT NOT NULL,
`create_time` DATETIME,
`s1` VARCHAR(500),
`s2` VARCHAR(500),
`s3` VARCHAR(500),
`s4` VARCHAR(500),
`s5` VARCHAR(500),
`s6` VARCHAR(500),
`s7` VARCHAR(1000),
`s8` VARCHAR(1000),
`s9` VARCHAR(1000),
`s10` VARCHAR(1000),
`s11` VARCHAR(500),
`s12` VARCHAR(500),
`s13` VARCHAR(500),
`s14` VARCHAR(500)
) ENGINE = OLAP UNIQUE KEY(`id`) DISTRIBUTED BY HASH(`id`) BUCKETS 32 PROPERTIES (
  "replication_allocation" = "tag.location.default: 1",
  "enable_unique_key_merge_on_write" = "true"
);

SeaTunnel Version

2.3.5

SeaTunnel Config

env {
  execution.parallelism = 1
  job.mode = "BATCH"
  checkpoint.interval = 10000
}

source {
  Doris {
    fenodes = "192.168.161.90:8030"
    username = doris_l
    password = "eewahTi9"
    database = "doris"
    table = "test_doris_test"
    doris.filter.query = "create_time is not null"
  }
}

sink {
  Doris {
    fenodes = "192.168.161.90:8030"
    username = doris_l
    password = "eewahTi9"
    database = "doris"
    table = "doris_test"
    sink.label-prefix = "ds"
    sink.enable-2pc = "true"
    sink.enable-delete = "true"
    doris.config {
      format = "json"
      read_json_by_line = "true"
    }
  }
}

Running Command

./bin/seatunnel.sh --config test.conf  -m local

Error Exception

===============================================================================

2024-08-14 20:58:43,214 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Fatal Error, 

2024-08-14 20:58:43,214 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Please submit bug report in https://github.com/apache/seatunnel/issues

2024-08-14 20:58:43,214 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Reason:SeaTunnel job executed failed 

2024-08-14 20:58:43,217 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
    at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
    at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.connectors.doris.exception.DorisConnectorException: ErrorCode:[Doris-05], ErrorDescription:[arrow read error] - class org.apache.seatunnel.shade.org.apache.arrow.vector.TimeStampMicroVector cannot be cast to class org.apache.seatunnel.shade.org.apache.arrow.vector.VarCharVector (org.apache.seatunnel.shade.org.apache.arrow.vector.TimeStampMicroVector and org.apache.seatunnel.shade.org.apache.arrow.vector.VarCharVector are in unnamed module of loader org.apache.seatunnel.engine.common.loader.SeaTunnelChildFirstClassLoader @1986e9a7)
    at org.apache.seatunnel.connectors.doris.source.serialization.RowBatch.readArrow(RowBatch.java:132)
    at org.apache.seatunnel.connectors.doris.source.reader.DorisValueReader.hasNext(DorisValueReader.java:231)
    at org.apache.seatunnel.connectors.doris.source.reader.DorisSourceReader.pollNext(DorisSourceReader.java:75)
    at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:156)
    at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:116)
    at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
    at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:121)
    at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
    at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1004)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
    ... 2 more

2024-08-14 20:58:43,217 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
===============================================================================

Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
    at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
    at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.connectors.doris.exception.DorisConnectorException: ErrorCode:[Doris-05], ErrorDescription:[arrow read error] - class org.apache.seatunnel.shade.org.apache.arrow.vector.TimeStampMicroVector cannot be cast to class org.apache.seatunnel.shade.org.apache.arrow.vector.VarCharVector (org.apache.seatunnel.shade.org.apache.arrow.vector.TimeStampMicroVector and org.apache.seatunnel.shade.org.apache.arrow.vector.VarCharVector are in unnamed module of loader org.apache.seatunnel.engine.common.loader.SeaTunnelChildFirstClassLoader @1986e9a7)
    at org.apache.seatunnel.connectors.doris.source.serialization.RowBatch.readArrow(RowBatch.java:132)
    at org.apache.seatunnel.connectors.doris.source.reader.DorisValueReader.hasNext(DorisValueReader.java:231)
    at org.apache.seatunnel.connectors.doris.source.reader.DorisSourceReader.pollNext(DorisSourceReader.java:75)
    at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:156)
    at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:116)
    at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
    at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:121)
    at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
    at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1004)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
    ... 2 more
2024-08-14 20:58:52,072 INFO  [s.c.s.s.c.ClientExecuteCommand] [ForkJoinPool.commonPool-worker-23] - run shutdown hook because get close signal

Zeta or Flink or Spark Version

No response

Java or Scala Version

jdk11

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

liugddx commented 4 weeks ago

What version of doris is yours?

Darkzoneleet commented 4 weeks ago

What version of doris is yours?

2.1.1

luzongzhu commented 3 weeks ago

+1

liugddx commented 3 weeks ago

Currently, doris 2.1.x has an 8-hour time difference when querying time type data. I will fix this issue after https://github.com/apache/doris/pull/38215 is fixed.

Darkzoneleet commented 1 week ago

Currently, doris 2.1.x has an 8-hour time difference when querying time type data. I will fix this issue after apache/doris#38215 is fixed.

Eh sorry, I contact maintainers in selectdb and they don't really know what's going on, maybe there's need for further communication with them? @liugddx

liugddx commented 1 week ago

Currently, doris 2.1.x has an 8-hour time difference when querying time type data. I will fix this issue after apache/doris#38215 is fixed.

Eh sorry, I contact maintainers in selectdb and they don't really know what's going on, maybe there's need for further communication with them? @liugddx

Look at this https://github.com/apache/doris/issues/38174#issuecomment-2245307121