apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.01k stars 1.82k forks source link

[Bug] [Spark] Error NoSuchFileException when run spark-submit in cluster mode #5373

Open vraziel opened 1 year ago

vraziel commented 1 year ago

Search before asking

What happened

Hello,

I was checking SeaTunnel with Spark in cluster mode. Docker runs a spark cluster. You can create Spark cluster with the following docker compose file.

docker-compose.txt

After cluster was ready, I tried to run the following command:

bin/start-seatunnel-spark-3-connector-v2.sh -m spark://127.0.0.1:7077 -e cluster -c config/v2.batch.config.template

But error was thrown by Spark engine.

SeaTunnel Version

2.3.3

SeaTunnel Config

env {
  # You can set SeaTunnel environment configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
  #execution.checkpoint.interval = 10000
  #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {
  # This is a example source plugin **only for test and demonstrate the feature source plugin**
  FakeSource {
    parallelism = 2
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }

  # If you would like to get more information about how to configure SeaTunnel and see full list of source plugins,
  # please go to https://seatunnel.apache.org/docs/category/source-v2
}

sink {
  Console {
  }

  # If you would like to get more information about how to configure SeaTunnel and see full list of sink plugins,
  # please go to https://seatunnel.apache.org/docs/category/sink-v2
}

Running Command

bin/start-seatunnel-spark-3-connector-v2.sh -m spark://127.0.0.1:7077 -e cluster -c config/v2.batch.config.template

Error Exception

xception from cluster was: java.nio.file.NoSuchFileException: /home/uraziel/apache-seatunnel-2.3.3-SNAPSHOT-bin/starter/seatunnel-spark-3-starter.jar
java.nio.file.NoSuchFileException: /home/uraziel/apache-seatunnel-2.3.3-SNAPSHOT-bin/starter/seatunnel-spark-3-starter.jar
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
    at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
    at java.nio.file.Files.copy(Files.java:1274)
    at org.apache.spark.util.Utils$.copyRecursive(Utils.scala:771)
    at org.apache.spark.util.Utils$.copyFile(Utils.scala:742)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:815)
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:557)
    at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:162)
    at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:179)
    at org.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:99)

Zeta or Flink or Spark Version

Spark 3.3.1

Java or Scala Version

No response

Screenshots

Shell image

Docker image

Are you willing to submit PR?

Code of Conduct

vraziel commented 1 year ago

I've changed docker compose file. It has a mounted directory with seatunnel.

docker-compose-mounted-seatunnel.txt Command bin/start-seatunnel-spark-3-connector-v2.sh -m spark://0.0.0.0:7077 -e client -c config/v2.batch.config.template - works properly.

But command bin/start-seatunnel-spark-3-connector-v2.sh -m spark://0.0.0.0:7077 -e cluster -c config/v2.batch.config.template - fails.

image

What am I doing wrong?

Should I copy seatunnel-spark-3-starter.jar to $SPARK_HOME/jars in every member of cluster?

liugddx commented 1 year ago

I've changed docker compose file. It has a mounted directory with seatunnel.

docker-compose-mounted-seatunnel.txt Command bin/start-seatunnel-spark-3-connector-v2.sh -m spark://0.0.0.0:7077 -e client -c config/v2.batch.config.template - works properly.

But command bin/start-seatunnel-spark-3-connector-v2.sh -m spark://0.0.0.0:7077 -e cluster -c config/v2.batch.config.template - fails.

image

What am I doing wrong?

Should I copy seatunnel-spark-3-starter.jar to $SPARK_HOME/jars in every member of cluster?

You can try this.