[Bug] [SeaTunnel Engine] In cluster model recover stream job, seatunnel server will find savepoint with wrong place

lordk911 commented 1 year ago

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

I'm trying to recover a cdc job in seatunnel cluster model, I had config to use hdfs to store savepoint, But seatunnel server still try to find savepoint from local. Also the client could not got error about the save point not found.

SeaTunnel Version

2.3.1

SeaTunnel Config

seatunnel:
  engine:
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      max-concurrent: 5
      tolerable-failure: 2
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: hdfs://nsdev/tmp # Ensure that the directory has written permission

Running Command

./bin/seatunnel.sh --config ./jobs/mycdc2doris.conf -r 694077488422191106

Error Exception

2023-03-31 15:14:08,580 INFO  com.hazelcast.client.impl.protocol.task.AuthenticationMessageTask - [10.0.105.246]:5801 [seatunnel] [5.1] Received auth from Connection[id=5, /10.0.105.246:5801->/10.0.105.250:50522, qualifier=null, endpoint=[10.0.105.250]:50522, remoteUuid=47a60eba-f5a1-41e5-8d6e-a3b7a05aaaed, alive=true, connectionType=JVM, planeIndex=-1], successfully authenticated, clientUuid: 47a60eba-f5a1-41e5-8d6e-a3b7a05aaaed, client name: hz.client_1, client version: 5.1
2023-03-31 15:14:09,438 INFO  org.apache.seatunnel.engine.server.master.JobMaster - Init JobMaster for Job SeaTunnel_Job (694077488422191106) 
2023-03-31 15:14:09,438 INFO  org.apache.seatunnel.engine.server.master.JobMaster - Job SeaTunnel_Job (694077488422191106) needed jar urls [file:/data/soft/seatunnel/seatunnel-2.3.1/connectors/seatunnel/connector-console-2.3.1.jar, file:/data/soft/seatunnel/seatunnel-2.3.1/connectors/seatunnel/connector-doris-2.3.1.jar, file:/data/soft/seatunnel/seatunnel-2.3.1/connectors/seatunnel/connector-cdc-mysql-2.3.1.jar]
2023-03-31 15:14:09,552 INFO  org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage - Path /tmp/seatunnel/checkpoint_snapshot/694077488422191106 is not a directory
2023-03-31 15:14:09,552 INFO  org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage - No checkpoint found for job, job id is: 694077488422191106
2023-03-31 15:14:09,562 INFO  org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage - Path /tmp/seatunnel/checkpoint_snapshot/694077488422191106 is not a directory
2023-03-31 15:14:09,562 INFO  org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage - No checkpoint found for job, job id is: 694077488422191106

Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

lordk911 commented 1 year ago

I've tried cluster mode , and my checkpoint config is :

seatunnel:
  engine:
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      max-concurrent: 5
      tolerable-failure: 2
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: hdfs://nsdev/tmp # Ensure that the directory has written permission

I can see the /tmp/seatunnel/checkpoint_snapshot dir on hdfs

but when I try to restore my job , from the server log I found :

2023-03-31 15:14:08,580 INFO  com.hazelcast.client.impl.protocol.task.AuthenticationMessageTask - [10.0.105.246]:5801 [seatunnel] [5.1] Received auth from Connection[id=5, /10.0.105.246:5801->/10.0.105.250:50522, qualifier=null, endpoint=[10.0.105.250]:50522, remoteUuid=47a60eba-f5a1-41e5-8d6e-a3b7a05aaaed, alive=true, connectionType=JVM, planeIndex=-1], successfully authenticated, clientUuid: 47a60eba-f5a1-41e5-8d6e-a3b7a05aaaed, client name: hz.client_1, client version: 5.1
2023-03-31 15:14:09,438 INFO  org.apache.seatunnel.engine.server.master.JobMaster - Init JobMaster for Job SeaTunnel_Job (694077488422191106) 
2023-03-31 15:14:09,438 INFO  org.apache.seatunnel.engine.server.master.JobMaster - Job SeaTunnel_Job (694077488422191106) needed jar urls [file:/data/soft/seatunnel/seatunnel-2.3.1/connectors/seatunnel/connector-console-2.3.1.jar, file:/data/soft/seatunnel/seatunnel-2.3.1/connectors/seatunnel/connector-doris-2.3.1.jar, file:/data/soft/seatunnel/seatunnel-2.3.1/connectors/seatunnel/connector-cdc-mysql-2.3.1.jar]
2023-03-31 15:14:09,552 INFO  org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage - Path /tmp/seatunnel/checkpoint_snapshot/694077488422191106 is not a directory
2023-03-31 15:14:09,552 INFO  org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage - No checkpoint found for job, job id is: 694077488422191106
2023-03-31 15:14:09,562 INFO  org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage - Path /tmp/seatunnel/checkpoint_snapshot/694077488422191106 is not a directory
2023-03-31 15:14:09,562 INFO  org.apache.seatunnel.engine.checkpoint.storage.hdfs.HdfsStorage - No checkpoint found for job, job id is: 694077488422191106

It seems seatunnel still try to restore from localfile. And because the checkpoint not found , use the same label-prefix for doris is wrong , I can see the error log from seatunnel server log . But on the client side which submit the job not got error.

EricJoy2048 commented 1 year ago

If you want start from a savepoint, you need use -r param like this:

lordk911 commented 1 year ago

If you want start from a savepoint, you need use -r param like this:

yes I use -r to restart from savepoint, but the seatunnel server still find savepoint from local, I had config to use hdfs to store savepoint.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

yunyuntank commented 1 year ago

I have the same problem ,could you tell me how to fix it ,thanks

apache / seatunnel