apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.57k stars 1.66k forks source link

[Bug] [OSS Checkpoint] OSS checkpoint not working #6779

Open shawyb opened 2 months ago

shawyb commented 2 months ago

Search before asking

What happened

When I use Aliyun OSS to store the checkpoint, the configuration is as follows, the storage is successful, and I can find the checkpoint from the OSS file. I deployed the seatunnel server using Docker. When I restart, the real-time synchronization tasks I previously established disappear, and the server does not reload historical tasks from the checkpoint. Does seatunel have the ability to reload historical tasks from checkpoint?

seatunnel: engine: history-job-expire-minutes: 1440 backup-count: 1 queue-type: blockingqueue print-execution-info-interval: 60 print-job-metrics-info-interval: 60 slot-service: dynamic-slot: true checkpoint: interval: 10000 timeout: 600000 storage: type: hdfs max-retained: 3 plugin-config: storage.type: oss namespace: /tmp/seatunnel/checkpoint_snapshot oss.bucket: oss://xxx fs.oss.accessKeyId: xxx fs.oss.accessKeySecret: xxx fs.oss.endpoint: oss-cn-hangzhou.aliyuncs.com

SeaTunnel Version

2.3.3

SeaTunnel Config

seatunnel:
  engine:
    history-job-expire-minutes: 1440
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 600000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          storage.type: oss
          namespace: /tmp/seatunnel/checkpoint_snapshot
          oss.bucket: oss://xxx
          fs.oss.accessKeyId: xxx
          fs.oss.accessKeySecret: xxx
          fs.oss.endpoint: oss-cn-hangzhou.aliyuncs.com

Running Command

run

Error Exception

ERROR org.apache.seatunnel.engine.server.operation.GetJobStatusOperation - [localhost]:5801 [seatunnel] [5.1] null
java.lang.NullPointerException: null
    at org.apache.seatunnel.engine.server.operation.GetJobStatusOperation.run(GetJobStatusOperation.java:81) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:189) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:273) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:248) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.run(OperationExecutorImpl.java:411) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.runOrExecute(OperationExecutorImpl.java:438) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:601) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:580) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:541) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:241) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.InvocationBuilderImpl.invoke(InvocationBuilderImpl.java:61) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.client.impl.protocol.task.AbstractInvocationMessageTask.processInternal(AbstractInvocationMessageTask.java:38) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.client.impl.protocol.task.AbstractAsyncMessageTask.processMessage(AbstractAsyncMessageTask.java:71) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.initializeAndProcessMessage(AbstractMessageTask.java:153) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.run(AbstractMessageTask.java:116) ~[seatunnel-starter.jar:2.3.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_261]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_261]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) ~[seatunnel-starter.jar:2.3.3]

Zeta or Flink or Spark Version

zeta 2.3.3

Java or Scala Version

1.8

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

xinfeingxia85 commented 1 month ago

I also encountered this issue, but now I've been utilizing Ali OSS HDFS to store my checkpoints successfully. I suggest you consider testing this solution!

shawyb commented 1 month ago

I also encountered this issue, but now I've been utilizing Ali OSS HDFS to store my checkpoints successfully. I suggest you consider testing this solution!

我存储成功了,但是如果重启docker的话不会读取checkpoint,所有任务都丢失了

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.