apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.79k stars 1.74k forks source link

[Bug][Generate Sink SQL]重复插入数据,关闭hadoop环境,提示hadoop.home.dir are unset #7253

Closed friendLive closed 1 week ago

friendLive commented 1 month ago

Search before asking

What happened

通过SeaTunnelEngineExample测试运行项目,能够查询出来数据并插入目标库,但是插入的数据有4组重复数据

SeaTunnel Version

3.3.5

SeaTunnel Config

env {
  parallelism = 1
  job.mode = "BATCH"
}
source{
    Jdbc {
        parallelism = 1
        url = "jdbc:mysql://ip:3306/db"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "123456"
        query = "select * from BASE_DATASOURCE limit 16"
    }
}

transform {
    # If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
    # please go to https://seatunnel.apache.org/docs/transform-v2/sql
}

sink {
     jdbc {
            url = "jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
            driver = "com.mysql.cj.jdbc.Driver"
            user = "root"
            password = "12345123"
            generate_sink_sql = true
            database = test
            table = BASE_DATASOURCE
      }
}

seatunnel.yaml配置
seatunnel:
  engine:
    history-job-expire-minutes: 1440
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      storage:
        type: localfile
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: localfile
          fs.defaultFS: file:///tmp/ # Ensure that the directory has written permission

插件配置只有mysql连接相关的插件

Running Command

启动SeaTunnelEngineExample的main方法

Error Exception

Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
    at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
    at org.apache.seatunnel.example.engine.SeaTunnelEngineExample.main(SeaTunnelEngineExample.java:43)
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: CheckpointCoordinator inside have error.
    at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:274)
    at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:270)
    at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$7(CheckpointCoordinator.java:535)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:737)
    at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:272)
    at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:288)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:840)
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:239)
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:219)
    at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318)
    at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307)
    at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338)
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:401)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.storeCheckPoint(HdfsStorage.java:107)
    at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:771)
    at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$7(CheckpointCoordinator.java:533)
    ... 6 more
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
    at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:549)
    at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:570)
    at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:593)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:690)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78)
    at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3487)
    at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3482)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3319)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.initStorage(HdfsStorage.java:68)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage.<init>(HdfsStorage.java:57)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.common.HdfsFileStorageInstance.getOrCreateStorage(HdfsFileStorageInstance.java:53)
    at org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorageFactory.create(HdfsStorageFactory.java:75)
    at org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.<init>(CheckpointManager.java:104)
    at org.apache.seatunnel.engine.server.master.JobMaster.initCheckPointManager(JobMaster.java:255)
    at org.apache.seatunnel.engine.server.master.JobMaster.init(JobMaster.java:238)
    at org.apache.seatunnel.engine.server.CoordinatorService.lambda$submitJob$3(CoordinatorService.java:475)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
    at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:469)
    at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:440)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:517)
    ... 19 more

    at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
    ... 2 more

Zeta or Flink or Spark Version

No response

Java or Scala Version

jdk1.8

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 1 week ago

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.