apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.93k stars 1.79k forks source link

[FAQ][spark-jdbc-source]"Unsupported type OTHER" ERROR when read CK DateTime64 field #1418

Open kyle-cx91 opened 2 years ago

kyle-cx91 commented 2 years ago

I found we don't have a native spark-clickhouse-source, but I need read some data from Clickhouse table,so I use spark-jdbc-source instead, and a problem came across. which will lead some of data can't be read out by SeaTunnel.

problem desc:

1、SeaTunnel version: incubating-2.0.5 build by myself from dev branch 2、everything is fine,when there is no DateTime64 field 3、get an "Unsupported type OTHER" while there is a DateTime64 field, while Date\DateTime words well

question :

to read out Clickhouse DateTime64 field data, what should we do ? add a ClickHouse-source connector or let JDBC-source support Datatime64?

Steps to reproduce the "Unsupported type OTHER" ERROR

table schema

CREATE TABLE pdi.st_source_dates
(
    `date04` DateTime64,
    `cust_id` String,
    `cust_name` String,
    `cust_idcard` String,
    `age` Int32,
    `sex` String
)
ENGINE = StripeLog

SeaTunnel config

env {
  # You can set spark configuration here
  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  spark.app.name = "SeaTunnel"
  spark.executor.instances = 2
  spark.executor.cores = 1
  spark.executor.memory = "1g"
  spark.master = local
}

source {
  jdbc {
    driver = "ru.yandex.clickhouse.ClickHouseDriver"
    url = "jdbc:clickhouse://192.168.10.204:8123/pdi"
    table = "st_source_dates"
    result_table_name = "SOURCE_VIEW"
    user = "default"
    password = "jRtXD8F8"
  }
}

transform {
}

sink {
 Console {}
}

Error Log

22/03/07 17:53:57 INFO SharedState: Warehouse path is 'file:/Users/zhangchenghu/software/seatunnel/spark-warehouse/'.
22/03/07 17:53:57 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
22/03/07 17:53:58 INFO Version: Elasticsearch Hadoop v6.8.3 [8a5f44bf7d]
22/03/07 17:53:58 INFO ClickHouseDriver: Driver registered
22/03/07 17:53:59 ERROR Seatunnel: 

===============================================================================

22/03/07 17:53:59 ERROR Seatunnel: Fatal Error, 

22/03/07 17:53:59 ERROR Seatunnel: Please submit bug report in https://github.com/apache/incubator-seatunnel/issues

22/03/07 17:53:59 ERROR Seatunnel: Reason:Unsupported type OTHER 

22/03/07 17:53:59 ERROR Seatunnel: Exception StackTrace:java.sql.SQLException: Unsupported type OTHER
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:186)
    at org.apache.seatunnel.spark.source.Jdbc.getData(Jdbc.scala:33)
    at org.apache.seatunnel.spark.source.Jdbc.getData(Jdbc.scala:28)
    at org.apache.seatunnel.spark.batch.SparkBatchExecution.registerInputTempView(SparkBatchExecution.java:54)
    at org.apache.seatunnel.spark.batch.SparkBatchExecution.lambda$start$0(SparkBatchExecution.java:95)
    at java.util.ArrayList.forEach(ArrayList.java:1259)
    at org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:95)
    at org.apache.seatunnel.Seatunnel.entryPoint(Seatunnel.java:103)
    at org.apache.seatunnel.Seatunnel.run(Seatunnel.java:61)
    at org.apache.seatunnel.SeatunnelSpark.main(SeatunnelSpark.java:29)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

22/03/07 17:53:59 ERROR Seatunnel: 
===============================================================================

Exception in thread "main" java.sql.SQLException: Unsupported type OTHER
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:186)
    at org.apache.seatunnel.spark.source.Jdbc.getData(Jdbc.scala:33)
    at org.apache.seatunnel.spark.source.Jdbc.getData(Jdbc.scala:28)
    at org.apache.seatunnel.spark.batch.SparkBatchExecution.registerInputTempView(SparkBatchExecution.java:54)
    at org.apache.seatunnel.spark.batch.SparkBatchExecution.lambda$start$0(SparkBatchExecution.java:95)
    at java.util.ArrayList.forEach(ArrayList.java:1259)
    at org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:95)
    at org.apache.seatunnel.Seatunnel.entryPoint(Seatunnel.java:103)
    at org.apache.seatunnel.Seatunnel.run(Seatunnel.java:61)
    at org.apache.seatunnel.SeatunnelSpark.main(SeatunnelSpark.java:29)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
kyle-cx91 commented 2 years ago

@CalvinKirs Please take a look at this issue

Hisoka-X commented 2 years ago

Seem like clickhouse jdbc driver work with spark jdbc have some problem, I will figure out and fix it.

Hisoka-X commented 2 years ago

@ZhangchengHu0923 Which clickhouse driver version did you used? I used https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc/0.3.2 worked fine. Note that ru.yandex.clickhouse.ClickHouseDriver will be removed, should use com.clickhouse.jdbc.ClickHouseDriver. But this version ru.yandex.clickhouse.ClickHouseDriver also work fine.

kyle-cx91 commented 2 years ago

@ZhangchengHu0923 Which clickhouse driver version did you used? I used https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc/0.3.2 worked fine. Note that ru.yandex.clickhouse.ClickHouseDriver will be removed, should use com.clickhouse.jdbc.ClickHouseDriver. But this version ru.yandex.clickhouse.ClickHouseDriver also work fine.

@BenJFan thx for ur reply. I used clickhouse-jdbc-0.2.6.jar ru.yandex.clickhouse.ClickHouseDriver

Hisoka-X commented 2 years ago

@ZhangchengHu0923 Which clickhouse driver version did you used? I used https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc/0.3.2 worked fine. Note that ru.yandex.clickhouse.ClickHouseDriver will be removed, should use com.clickhouse.jdbc.ClickHouseDriver. But this version ru.yandex.clickhouse.ClickHouseDriver also work fine.

@BenJFan thx for ur reply. I used clickhouse-jdbc-0.2.6.jar ru.yandex.clickhouse.ClickHouseDriver

Maybe update your jdbc version could be good choice.