apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.81k stars 1.76k forks source link

[Bug] [Connector] No suitable driver found for jdbc:mysql #7536

Closed antony-do closed 1 week ago

antony-do commented 2 weeks ago

Search before asking

What happened

Description:

I am encountering an issue while trying to run SeaTunnel on a GCP Dataproc cluster with the following setup:

Steps Taken:

1. Copied the Hadoop configuration files (yarn-site.xml, hdfs-site.xml, and core-site.xml) from the Dataproc cluster to the Airflow worker machines.
2. Updated these configuration files to point to the master node of the Dataproc cluster.

Problem:

When attempting to run SeaTunnel from the Airflow workers using the following command, I encounter an error:

SeaTunnel Version

2.3.4

SeaTunnel Config

env {  
  parallelism = 4
  job.mode = "BATCH"
  spark.executor.cores = 6
}

source {
  Jdbc {
    url = "jdbc:mysql://mysql-host:3306/dbname"
    driver = "com.mysql.cj.jdbc.Driver"
    connection_check_timeout_sec = 100
    user = "user_xxx"
    password = "pass_xxx"
    query = "SELECT *, DATE(created_time) as dt FROM TableX sh WHERE 1=1 AND `updated_time` >= STR_TO_DATE('2024-08-19_08:20:00.000','%Y-%m-%d_%H:%i:%S') AND `updated_time` < STR_TO_DATE('2024-08-19_08:25:00.000','%Y-%m-%d_%H:%i:%S') AND status IN (0,1,2,3,4,5,6);"
  }
}

transform {
}

sink {
  Hive {
    table_name = "table_xxx"
    metastore_uri = "thrift://metastore-host:9083"
  }
}

Running Command

${SEATUNNEL_HOME}/bin/start-seatunnel-spark-3-connector-v2.sh --config test-seatunnel.conf --master yarn  --deploy-mode cluster --name test-seatunnel

Error Exception

24/08/30 09:41:43 ERROR SeaTunnel: Fatal Error, 

24/08/30 09:41:43 ERROR SeaTunnel: Please submit bug report in https://github.com/apache/seatunnel/issues

24/08/30 09:41:43 ERROR SeaTunnel: Reason:Run SeaTunnel on spark failed 

24/08/30 09:41:43 ERROR SeaTunnel: Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: Run SeaTunnel on spark failed
    at org.apache.seatunnel.core.starter.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:62)
    at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
    at org.apache.seatunnel.core.starter.spark.SeaTunnelSpark.main(SeaTunnelSpark.java:35)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:757)
Caused by: org.apache.seatunnel.api.table.catalog.exception.CatalogException: ErrorCode:[API-03], ErrorDescription:[Catalog initialize failed] - Failed connecting to jdbc:mysql://mysql-host:3306/dbname via JDBC.
    at org.apache.seatunnel.connectors.seatunnel.jdbc.catalog.AbstractJdbcCatalog.getConnection(AbstractJdbcCatalog.java:121)
    at org.apache.seatunnel.connectors.seatunnel.jdbc.catalog.AbstractJdbcCatalog.open(AbstractJdbcCatalog.java:127)
    at org.apache.seatunnel.connectors.seatunnel.jdbc.utils.JdbcCatalogUtils.getTables(JdbcCatalogUtils.java:78)
    at org.apache.seatunnel.connectors.seatunnel.jdbc.source.JdbcSource.<init>(JdbcSource.java:57)
    at org.apache.seatunnel.connectors.seatunnel.jdbc.source.JdbcSourceFactory.lambda$createSource$0(JdbcSourceFactory.java:78)
    at org.apache.seatunnel.core.starter.execution.PluginUtil.createSource(PluginUtil.java:85)
    at org.apache.seatunnel.core.starter.spark.execution.SourceExecuteProcessor.initializePlugins(SourceExecuteProcessor.java:130)
    at org.apache.seatunnel.core.starter.spark.execution.SparkAbstractPluginExecuteProcessor.<init>(SparkAbstractPluginExecuteProcessor.java:50)
    at org.apache.seatunnel.core.starter.spark.execution.SourceExecuteProcessor.<init>(SourceExecuteProcessor.java:62)
    at org.apache.seatunnel.core.starter.spark.execution.SparkExecution.<init>(SparkExecution.java:54)
    at org.apache.seatunnel.core.starter.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:59)
    ... 7 more
Caused by: java.sql.SQLException: No suitable driver found for jdbc:mysql://mysql-host:3306/dbname
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:228)
    at org.apache.seatunnel.connectors.seatunnel.jdbc.catalog.AbstractJdbcCatalog.getConnection(AbstractJdbcCatalog.java:117)
    ... 17 more

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

RaymondFishWang commented 2 weeks ago

you should check whether the mysql lib exist in lib folder under seatunnel directory

antony-do commented 2 weeks ago

you should check whether the mysql lib exist in lib folder under Seatunnel directory

Thanks for your reply, I've checked it, required libs should be there The followings are the ls's output

root@airflow-worker-1:/opt/seatunnel# ls -la lib
total 42892
drwxr-xr-x 2 airflow root     4096 Aug 30 03:16 .
drwxr-xr-x 1 airflow root     4096 Sep  2 04:55 ..
-rw-r--r-- 1 airflow root 42186570 Nov  8  2023 seatunnel-hadoop3-3.1.4-uber.jar
-rw-r--r-- 1 airflow root  1718859 Nov  8  2023 seatunnel-transforms-v2.jar
root@airflow-worker-1:/opt/seatunnel# ls -la connectors/
total 783872
drwxr-xr-x 1 airflow root      4096 Sep  2 08:27 .
drwxr-xr-x 1 airflow root      4096 Sep  2 04:55 ..
-rw-r--r-- 1 airflow root  13949765 Sep  2 04:53 connector-cassandra-2.3.4.jar
-rw-r--r-- 1 airflow root  29688583 Sep  2 04:53 connector-cdc-mongodb-2.3.4.jar
-rw-r--r-- 1 airflow root  29901548 Sep  2 04:53 connector-cdc-mysql-2.3.4.jar
-rw-r--r-- 1 airflow root  26264404 Sep  2 04:54 connector-cdc-sqlserver-2.3.4.jar
-rw-r--r-- 1 airflow root  30830337 Sep  2 04:54 connector-clickhouse-2.3.4.jar
-rw-r--r-- 1 airflow root     76217 Nov  8  2023 connector-console-2.3.4.jar
-rw-r--r-- 1 airflow root   5515061 Sep  2 04:54 connector-elasticsearch-2.3.4.jar
-rw-r--r-- 1 airflow root    196426 Nov  8  2023 connector-fake-2.3.4.jar
-rw-r--r-- 1 airflow root  41576013 Sep  2 04:54 connector-file-hadoop-2.3.4.jar
-rw-r--r-- 1 airflow root  40976808 Sep  2 04:54 connector-file-jindo-oss-2.3.4.jar
-rw-r--r-- 1 airflow root  41570684 Sep  2 04:54 connector-file-local-2.3.4.jar
-rw-r--r-- 1 airflow root  44582670 Sep  2 04:54 connector-file-s3-2.3.4.jar
-rw-r--r-- 1 airflow root  41876002 Sep  2 04:54 connector-file-sftp-2.3.4.jar
-rw-r--r-- 1 airflow root  46945918 Sep  2 04:54 connector-google-firestore-2.3.4.jar
-rw-r--r-- 1 airflow root  41598467 Sep  2 04:54 connector-hive-2.3.4.jar
-rw-r--r-- 1 airflow root 157681152 Sep  2 04:54 connector-hudi-2.3.4.jar
-rw-r--r-- 1 airflow root    703664 Sep  2 04:55 connector-jdbc-2.3.4.jar
-rw-r--r-- 1 airflow root   2478937 Sep  2 04:55 connector-mongodb-2.3.4.jar
-rw-r--r-- 1 airflow root 148820978 Sep  2 04:55 connector-openmldb-2.3.4.jar
-rw-r--r-- 1 airflow root    827888 Sep  2 04:55 connector-rabbitmq-2.3.4.jar
-rw-r--r-- 1 airflow root   1369176 Sep  2 04:55 connector-redis-2.3.4.jar
-rw-r--r-- 1 airflow root  53636049 Sep  2 04:55 connector-s3-redshift-2.3.4.jar
-rw-r--r-- 1 airflow root    171887 Sep  2 04:55 connector-socket-2.3.4.jar
-rw-r--r-- 1 root    root    452895 Sep  2 08:15 datasource-jdbc-hive-1.0.0-SNAPSHOT.jar
-rw-r--r-- 1 root    root    455302 Sep  2 08:15 datasource-jdbc-mysql-1.0.0-SNAPSHOT.jar
-rw-r--r-- 1 root    root    454778 Sep  2 08:27 datasource-mysql-cdc-1.0.0-SNAPSHOT.jar
-rw-r--r-- 1 airflow root      5660 Nov  8  2023 plugin-mapping.properties

I'd like to provide more logs for this issue, this is the the logs when I run Seatunnel in Airflow worker

[2024-09-02, 08:27:52 UTC] {subprocess.py:93} INFO - [INFO] LINE COUNT: 25...
[2024-09-02, 08:27:53 UTC] {subprocess.py:93} INFO - Execute SeaTunnel Spark Job: ${SPARK_HOME}/bin/spark-submit --class "org.apache.seatunnel.core.starter.spark.SeaTunnelSpark" --name "seatunnel_test" --master "yarn" --deploy-mode "cluster" --jars "/opt/seatunnel/lib/seatunnel-transforms-v2.jar,/opt/seatunnel/lib/seatunnel-hadoop3-3.1.4-uber.jar,/opt/seatunnel/connectors/connector-hive-2.3.4.jar,/opt/seatunnel/connectors/connector-jdbc-2.3.4.jar" --files "/opt/seatunnel/plugins.tar.gz,seatunnel_test.conf" --conf "job.mode=BATCH" --conf "parallelism=4" --conf "spark.executor.cores=6" /opt/seatunnel/starter/seatunnel-spark-3-starter.jar --config "seatunnel_test.conf" --master "yarn" --deploy-mode "cluster" --name "seatunnel_test"
[2024-09-02, 08:27:53 UTC] {subprocess.py:93} INFO - /usr/lib/spark/bin/load-spark-env.sh: line 68: ps: command not found
[2024-09-02, 08:27:54 UTC] {subprocess.py:93} INFO - Warning: Ignoring non-Spark config property: job.mode
[2024-09-02, 08:27:54 UTC] {subprocess.py:93} INFO - Warning: Ignoring non-Spark config property: parallelism
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: An illegal reflective access operation has occurred
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: Illegal reflective access by org.apache.hadoop.shaded.org.xbill.DNS.ResolverConfig (file:/usr/lib/spark/jars/hadoop-client-runtime-3.3.4.jar) to method sun.net.dns.ResolverConfiguration.open()
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.shaded.org.xbill.DNS.ResolverConfig
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: All illegal access operations will be denied in a future release
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:55 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/10.35.0.34:8032
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:55 INFO AHSProxy: Connecting to Application History server at dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/10.35.0.34:10200
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Configuration: resource-types.xml not found
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO ResourceUtils: Unable to find 'resource-types.xml'.
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (60336 MB per container)
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Setting up container launch context for our AM
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Setting up the launch environment for our AM container
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Preparing resources for our AM container
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
[2024-09-02, 08:28:00 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:00 INFO Client: Uploading resource file:/tmp/spark-d8b706a6-c0e4-4e73-8cb5-291cce370baa/__spark_libs__8396041875040289561.zip -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/__spark_libs__8396041875040289561.zip
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/starter/seatunnel-spark-3-starter.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/seatunnel-spark-3-starter.jar
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/lib/seatunnel-transforms-v2.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/seatunnel-transforms-v2.jar
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/lib/seatunnel-hadoop3-3.1.4-uber.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/seatunnel-hadoop3-3.1.4-uber.jar
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/connectors/connector-hive-2.3.4.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/connector-hive-2.3.4.jar
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/connectors/connector-jdbc-2.3.4.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/connector-jdbc-2.3.4.jar
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO Client: Uploading resource file:/opt/seatunnel/plugins.tar.gz -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/plugins.tar.gz
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO Client: Uploading resource file:/opt/airflow/seatunnel_config/generated_configs/daily/seatunnel_test.conf -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/ysql_dev_config_test.conf
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO Client: Uploading resource file:/tmp/spark-d8b706a6-c0e4-4e73-8cb5-291cce370baa/__spark_conf__18087849100221250538.zip -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/__spark_conf__.zip
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: Changing view acls to: root
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: Changing modify acls to: root
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: Changing view acls groups to:
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: Changing modify acls groups to:
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: root; groups with view permissions: EMPTY; users with modify permissions: root; groups with modify permissions: EMPTY
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO Client: Submitting application application_1717121928994_41663 to ResourceManager
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO YarnClientImpl: Submitted application application_1717121928994_41663
[2024-09-02, 08:28:03 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:03 INFO Client: Application report for application_1717121928994_41663 (state: ACCEPTED)
...
RaymondFishWang commented 2 weeks ago

image

antony-do commented 1 week ago

image

It works like a charm, thank you very much @RaymondFishWang