Closed antony-do closed 2 months ago
you should check whether the mysql lib exist in lib folder under seatunnel directory
you should check whether the mysql lib exist in lib folder under Seatunnel directory
Thanks for your reply, I've checked it, required libs should be there
The followings are the ls
's output
root@airflow-worker-1:/opt/seatunnel# ls -la lib
total 42892
drwxr-xr-x 2 airflow root 4096 Aug 30 03:16 .
drwxr-xr-x 1 airflow root 4096 Sep 2 04:55 ..
-rw-r--r-- 1 airflow root 42186570 Nov 8 2023 seatunnel-hadoop3-3.1.4-uber.jar
-rw-r--r-- 1 airflow root 1718859 Nov 8 2023 seatunnel-transforms-v2.jar
root@airflow-worker-1:/opt/seatunnel# ls -la connectors/
total 783872
drwxr-xr-x 1 airflow root 4096 Sep 2 08:27 .
drwxr-xr-x 1 airflow root 4096 Sep 2 04:55 ..
-rw-r--r-- 1 airflow root 13949765 Sep 2 04:53 connector-cassandra-2.3.4.jar
-rw-r--r-- 1 airflow root 29688583 Sep 2 04:53 connector-cdc-mongodb-2.3.4.jar
-rw-r--r-- 1 airflow root 29901548 Sep 2 04:53 connector-cdc-mysql-2.3.4.jar
-rw-r--r-- 1 airflow root 26264404 Sep 2 04:54 connector-cdc-sqlserver-2.3.4.jar
-rw-r--r-- 1 airflow root 30830337 Sep 2 04:54 connector-clickhouse-2.3.4.jar
-rw-r--r-- 1 airflow root 76217 Nov 8 2023 connector-console-2.3.4.jar
-rw-r--r-- 1 airflow root 5515061 Sep 2 04:54 connector-elasticsearch-2.3.4.jar
-rw-r--r-- 1 airflow root 196426 Nov 8 2023 connector-fake-2.3.4.jar
-rw-r--r-- 1 airflow root 41576013 Sep 2 04:54 connector-file-hadoop-2.3.4.jar
-rw-r--r-- 1 airflow root 40976808 Sep 2 04:54 connector-file-jindo-oss-2.3.4.jar
-rw-r--r-- 1 airflow root 41570684 Sep 2 04:54 connector-file-local-2.3.4.jar
-rw-r--r-- 1 airflow root 44582670 Sep 2 04:54 connector-file-s3-2.3.4.jar
-rw-r--r-- 1 airflow root 41876002 Sep 2 04:54 connector-file-sftp-2.3.4.jar
-rw-r--r-- 1 airflow root 46945918 Sep 2 04:54 connector-google-firestore-2.3.4.jar
-rw-r--r-- 1 airflow root 41598467 Sep 2 04:54 connector-hive-2.3.4.jar
-rw-r--r-- 1 airflow root 157681152 Sep 2 04:54 connector-hudi-2.3.4.jar
-rw-r--r-- 1 airflow root 703664 Sep 2 04:55 connector-jdbc-2.3.4.jar
-rw-r--r-- 1 airflow root 2478937 Sep 2 04:55 connector-mongodb-2.3.4.jar
-rw-r--r-- 1 airflow root 148820978 Sep 2 04:55 connector-openmldb-2.3.4.jar
-rw-r--r-- 1 airflow root 827888 Sep 2 04:55 connector-rabbitmq-2.3.4.jar
-rw-r--r-- 1 airflow root 1369176 Sep 2 04:55 connector-redis-2.3.4.jar
-rw-r--r-- 1 airflow root 53636049 Sep 2 04:55 connector-s3-redshift-2.3.4.jar
-rw-r--r-- 1 airflow root 171887 Sep 2 04:55 connector-socket-2.3.4.jar
-rw-r--r-- 1 root root 452895 Sep 2 08:15 datasource-jdbc-hive-1.0.0-SNAPSHOT.jar
-rw-r--r-- 1 root root 455302 Sep 2 08:15 datasource-jdbc-mysql-1.0.0-SNAPSHOT.jar
-rw-r--r-- 1 root root 454778 Sep 2 08:27 datasource-mysql-cdc-1.0.0-SNAPSHOT.jar
-rw-r--r-- 1 airflow root 5660 Nov 8 2023 plugin-mapping.properties
I'd like to provide more logs for this issue, this is the the logs when I run Seatunnel in Airflow worker
[2024-09-02, 08:27:52 UTC] {subprocess.py:93} INFO - [INFO] LINE COUNT: 25...
[2024-09-02, 08:27:53 UTC] {subprocess.py:93} INFO - Execute SeaTunnel Spark Job: ${SPARK_HOME}/bin/spark-submit --class "org.apache.seatunnel.core.starter.spark.SeaTunnelSpark" --name "seatunnel_test" --master "yarn" --deploy-mode "cluster" --jars "/opt/seatunnel/lib/seatunnel-transforms-v2.jar,/opt/seatunnel/lib/seatunnel-hadoop3-3.1.4-uber.jar,/opt/seatunnel/connectors/connector-hive-2.3.4.jar,/opt/seatunnel/connectors/connector-jdbc-2.3.4.jar" --files "/opt/seatunnel/plugins.tar.gz,seatunnel_test.conf" --conf "job.mode=BATCH" --conf "parallelism=4" --conf "spark.executor.cores=6" /opt/seatunnel/starter/seatunnel-spark-3-starter.jar --config "seatunnel_test.conf" --master "yarn" --deploy-mode "cluster" --name "seatunnel_test"
[2024-09-02, 08:27:53 UTC] {subprocess.py:93} INFO - /usr/lib/spark/bin/load-spark-env.sh: line 68: ps: command not found
[2024-09-02, 08:27:54 UTC] {subprocess.py:93} INFO - Warning: Ignoring non-Spark config property: job.mode
[2024-09-02, 08:27:54 UTC] {subprocess.py:93} INFO - Warning: Ignoring non-Spark config property: parallelism
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: An illegal reflective access operation has occurred
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: Illegal reflective access by org.apache.hadoop.shaded.org.xbill.DNS.ResolverConfig (file:/usr/lib/spark/jars/hadoop-client-runtime-3.3.4.jar) to method sun.net.dns.ResolverConfiguration.open()
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.shaded.org.xbill.DNS.ResolverConfig
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - WARNING: All illegal access operations will be denied in a future release
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:55 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/10.35.0.34:8032
[2024-09-02, 08:27:55 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:55 INFO AHSProxy: Connecting to Application History server at dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/10.35.0.34:10200
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Configuration: resource-types.xml not found
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO ResourceUtils: Unable to find 'resource-types.xml'.
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (60336 MB per container)
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Setting up container launch context for our AM
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Setting up the launch environment for our AM container
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 INFO Client: Preparing resources for our AM container
[2024-09-02, 08:27:56 UTC] {subprocess.py:93} INFO - 24/09/02 08:27:56 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
[2024-09-02, 08:28:00 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:00 INFO Client: Uploading resource file:/tmp/spark-d8b706a6-c0e4-4e73-8cb5-291cce370baa/__spark_libs__8396041875040289561.zip -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/__spark_libs__8396041875040289561.zip
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/starter/seatunnel-spark-3-starter.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/seatunnel-spark-3-starter.jar
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/lib/seatunnel-transforms-v2.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/seatunnel-transforms-v2.jar
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/lib/seatunnel-hadoop3-3.1.4-uber.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/seatunnel-hadoop3-3.1.4-uber.jar
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/connectors/connector-hive-2.3.4.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/connector-hive-2.3.4.jar
[2024-09-02, 08:28:01 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:01 INFO Client: Uploading resource file:/opt/seatunnel/connectors/connector-jdbc-2.3.4.jar -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/connector-jdbc-2.3.4.jar
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO Client: Uploading resource file:/opt/seatunnel/plugins.tar.gz -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/plugins.tar.gz
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO Client: Uploading resource file:/opt/airflow/seatunnel_config/generated_configs/daily/seatunnel_test.conf -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/ysql_dev_config_test.conf
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO Client: Uploading resource file:/tmp/spark-d8b706a6-c0e4-4e73-8cb5-291cce370baa/__spark_conf__18087849100221250538.zip -> hdfs://dataproc-cluster-name-m.asia-northeast1-c.c.project-id.internal/user/root/.sparkStaging/application_1717121928994_41663/__spark_conf__.zip
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: Changing view acls to: root
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: Changing modify acls to: root
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: Changing view acls groups to:
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: Changing modify acls groups to:
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: root; groups with view permissions: EMPTY; users with modify permissions: root; groups with modify permissions: EMPTY
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO Client: Submitting application application_1717121928994_41663 to ResourceManager
[2024-09-02, 08:28:02 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:02 INFO YarnClientImpl: Submitted application application_1717121928994_41663
[2024-09-02, 08:28:03 UTC] {subprocess.py:93} INFO - 24/09/02 08:28:03 INFO Client: Application report for application_1717121928994_41663 (state: ACCEPTED)
...
It works like a charm, thank you very much @RaymondFishWang
Search before asking
What happened
Description:
I am encountering an issue while trying to run SeaTunnel on a GCP Dataproc cluster with the following setup:
Cluster Configuration: • GCP Dataproc cluster with Spark installed using YARN in cluster mode. • SeaTunnel plugins installed on every machine within the Dataproc cluster. • Airflow workers are located on separate machines, distinct from the Dataproc instances.
I've tried to run SeaTunnel in any node of the cluster it's working just fine, but I want to call Seatunnel from Airflow's workers which are different machine instead, so I've been setting up as the followings.
Steps Taken:
Problem:
When attempting to run SeaTunnel from the Airflow workers using the following command, I encounter an error:
SeaTunnel Version
2.3.4
SeaTunnel Config
Running Command
Error Exception
Zeta or Flink or Spark Version
No response
Java or Scala Version
No response
Screenshots
No response
Are you willing to submit PR?
Code of Conduct