apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.02k stars 1.82k forks source link

使用jdbc从DB2拉数据到hive,一直失败 #689

Closed gmowency closed 6 days ago

gmowency commented 2 years ago

Search before asking

What happened

seatunnel Version(seatunnel版本)

waterdrop-1.5.4

Flink or Spark Version(Flink 或者 Spark 版本)

Spark 2.4.0+cdh5.16.1

Java or Scala Version(Java或者Scala版本)

Java 1.8.0_181

batch1234.conf

spark {
  spark.app.name = "Waterdrop"
  spark.executor.instances = 2
  spark.executor.cores = 1
  spark.executor.memory = "1g"
}

input { 
    jdbc {  
    driver = "COM.ibm.db2.jdbc.app.DB2Driver"
    url = "jdbc:db2://xx.xx.xx.xx:60000/cxldb"
    table = "(select sysid,field_name,option,name from cxlcm.dtcm0006_field_option_list) AS dtcm0006_field_option_list " 
    user = "scuser"
    password = "scuser"
    result_table_name = "dtcm0006_field_option_list"
    }
}

filter {
  # split data by specific delimiter
}

output {    

  Hive {
    source_table_name = "dtcm0006_field_option_list"
    result_table_name = "atom.incre_dtcm0006_field_option_list"
    save_mode = "overwrite"
    sink_columns = "sysid,field_name,option,name"
  }
}

启动命令 /home/etlapp/main/install/waterdrop-1.5.4/bin/start-waterdrop.sh --master yarn --deploy-mode client --config /home/etlapp/main/install/waterdrop-1.5.4/config/batch1234.conf

Error Exception INFO internal.SharedState: loading hive config file: file:/etc/spark2/conf.cloudera.spark2_on_yarn/yarn-conf/hive-site.xml INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/raid/hive/warehouse'). INFO internal.SharedState: Warehouse path is '/raid/hive/warehouse'. INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL. INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2dde0a34{/SQL,null,AVAILABLE,@Spark} INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/json. INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b632442{/SQL/json,null,AVAILABLE,@Spark} INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution. INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@19d3f6ad{/SQL/execution,null,AVAILABLE,@Spark} INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json. INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@19e801b5{/SQL/execution/json,null,AVAILABLE,@Spark} INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql. INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@8ee03f5{/static/sql,null,AVAILABLE,@Spark} INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint found and registered UDFs count[2], UDAFs count[0] INFO util.Version: Elasticsearch Hadoop v7.6.2 [8aeabb5ee9] Exception in thread "main" java.lang.Exception: org.apache.spark.sql.AnalysisException: Multiple sources found for jdbc (org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider, org.apache.spark.sql.execution.datasources.jdbc.DefaultSource), please specify the fully qualified class name.; at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:43) at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.sql.AnalysisException: Multiple sources found for jdbc (org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider, org.apache.spark.sql.execution.datasources.jdbc.DefaultSource), please specify the fully qualified class name.; at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:688) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) at io.github.interestinglab.waterdrop.input.batch.Jdbc.getDataset(Jdbc.scala:76) at io.github.interestinglab.waterdrop.Waterdrop$.registerInputTempViewWithHead(Waterdrop.scala:306) at io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:193) at io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:120) at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:38) at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:38) at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:38) at scala.util.Try$.apply(Try.scala:192) at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:38) ... 13 more INFO spark.SparkContext: Invoking stop() from shutdown hook INFO server.AbstractConnector: Stopped Spark@40298285{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} INFO ui.SparkUI: Stopped Spark web UI at http://host226:4040 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread INFO cluster.YarnClientSchedulerBackend: Shutting down all executors INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) INFO cluster.YarnClientSchedulerBackend: Stopped INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! INFO memory.MemoryStore: MemoryStore cleared INFO storage.BlockManager: BlockManager stopped INFO storage.BlockManagerMaster: BlockManagerMaster stopped INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! INFO spark.SparkContext: Successfully stopped SparkContext INFO util.ShutdownHookManager: Shutdown hook called INFO util.ShutdownHookManager: Deleting directory /tmp/spark-cb0ba2a3-7906-47e5-8624-f3a21520ea57 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-89fe00ba-d50a-460f-b104-b84763cce02b

没有添加jdbc jar包,以及添加jdbc jar包都是这个报错

What you expected to happen

。。。

How to reproduce

。。。

Anything else

。。。

Are you willing to submit PR?

Code of Conduct

yx91490 commented 2 years ago

org.apache.spark.sql.execution.datasources.jdbc.DefaultSource 这个类是从哪个包里来的?是不是可以exclude掉