apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.59k stars 1.67k forks source link

[Bug] [Mongo] The column type is <STRING>, but a null value is being written into it #6884

Open Light-Towers opened 1 month ago

Light-Towers commented 1 month ago

Search before asking

What happened

The column ’exhibit_desc ‘ type is string in hive, exist Null , sink mongo error

SeaTunnel Version

2.3.5

SeaTunnel Config

env {
  execution.parallelism = 2
  job.mode = "BATCH"
}

source {
  Hive {
    table_name = "dw.app_test"
    metastore_uri = "thrift://master02:9083"
    hdfs_site_path = "/etc/hadoop/conf/hdfs-site.xml"
    read_partitions = ["dt=2024-05-21"]
    read_columns = ["journal_name","exhibit_desc"]
  }
}

sink {
  MongoDB{
    uri = "mongodb://root:root@master01:27017/admin?connectTimeoutMS=10000&authSource=admin"
    database = "test"
    collection = "app_exhibition"
    schema = {
      fields {
        _id = string
        journal_name = STRING
        exhibit_desc = STRING
      }
    }
  }
}

Running Command

./bin/seatunnel.sh --config ./config/sync_conf/hive2mongo -e local

Error Exception

2024-05-22 18:01:30,065 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
    at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
    at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: java.lang.RuntimeException: org.apache.seatunnel.connectors.seatunnel.mongodb.exception.MongodbConnectorException: ErrorCode:[COMMON-07], ErrorDescription:[Unsupported data type] - The column type is <STRING>, but a null value is being written into it
    at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:262)
    at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:68)
    at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
    at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
    at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:75)
    at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
    at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
    at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
    at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
    at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
    at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
    at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1004)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.seatunnel.connectors.seatunnel.mongodb.exception.MongodbConnectorException: ErrorCode:[COMMON-07], ErrorDescription:[Unsupported data type] - The column type is <STRING>, but a null value is being written into it
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$2.apply(RowDataToBsonConverters.java:95)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$2.apply(RowDataToBsonConverters.java:89)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$15.apply(RowDataToBsonConverters.java:326)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$15.apply(RowDataToBsonConverters.java:318)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$2.apply(RowDataToBsonConverters.java:101)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$2.apply(RowDataToBsonConverters.java:89)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$1.convert(RowDataToBsonConverters.java:77)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataDocumentSerializer.lambda$createWriteModelSuppliers$2(RowDataDocumentSerializer.java:94)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataDocumentSerializer.serializeToWriteModel(RowDataDocumentSerializer.java:68)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataDocumentSerializer.serializeToWriteModel(RowDataDocumentSerializer.java:43)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.sink.MongodbWriter.write(MongodbWriter.java:100)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.sink.MongodbWriter.write(MongodbWriter.java:49)
    at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:252)
    ... 16 more

    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
    ... 2 more

2024-05-22 18:01:30,066 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
===============================================================================

Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
    at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
    at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: java.lang.RuntimeException: org.apache.seatunnel.connectors.seatunnel.mongodb.exception.MongodbConnectorException: ErrorCode:[COMMON-07], ErrorDescription:[Unsupported data type] - The column type is <STRING>, but a null value is being written into it
    at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:262)
    at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:68)
    at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
    at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
    at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:75)
    at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
    at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
    at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
    at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
    at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
    at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
    at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1004)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.seatunnel.connectors.seatunnel.mongodb.exception.MongodbConnectorException: ErrorCode:[COMMON-07], ErrorDescription:[Unsupported data type] - The column type is <STRING>, but a null value is being written into it
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$2.apply(RowDataToBsonConverters.java:95)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$2.apply(RowDataToBsonConverters.java:89)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$15.apply(RowDataToBsonConverters.java:326)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$15.apply(RowDataToBsonConverters.java:318)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$2.apply(RowDataToBsonConverters.java:101)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$2.apply(RowDataToBsonConverters.java:89)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataToBsonConverters$1.convert(RowDataToBsonConverters.java:77)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataDocumentSerializer.lambda$createWriteModelSuppliers$2(RowDataDocumentSerializer.java:94)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataDocumentSerializer.serializeToWriteModel(RowDataDocumentSerializer.java:68)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.serde.RowDataDocumentSerializer.serializeToWriteModel(RowDataDocumentSerializer.java:43)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.sink.MongodbWriter.write(MongodbWriter.java:100)
    at org.apache.seatunnel.connectors.seatunnel.mongodb.sink.MongodbWriter.write(MongodbWriter.java:49)
    at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:252)

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.