apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.79k stars 1.74k forks source link

[Bug] [Seatunnel Zeta] Job status SCHEDULED #7263

Open sdvdxl opened 1 month ago

sdvdxl commented 1 month ago

Search before asking

What happened

单节点, cluster 模式。

jvm config:

# JVM Heap
-Xms16g
-Xmx16g

# JVM Dump
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/seatunnel/dump/zeta-server

# Metaspace
-XX:MaxMetaspaceSize=2g

# G1GC
-XX:+UseG1GC
  1. 启动了几个任务,运行了几天
  2. 又提交任务,显示状态为 SCHEDULED image

请问

  1. 需要怎么处理?
  2. job status 有哪些,代表什么意思?稳定上没看到说明
  3. 如果指定任务内存限制?

SeaTunnel Version

2.3.5

SeaTunnel Config

seatunnel:
  engine:
    history-job-expire-minutes: 525600
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /mnt/sdb1/data/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: file:///mnt/sdb1/data/seatunnel/storage/ # Ensure that the directory has written permission

Running Command

bin/seatunnel.sh --config config/v2_sheng__sync_device.conf -n v2_sheng_sync_device -r 999900001

Error Exception

没报错,用 bin/seatunnel.sh -l 查看任务,新任务显示  SCHEDULED。

补充

之前也出现过,简单看了下源码,看到有判断内存的地方,猜测是资源不够,重新分配内存到 16G,重启服务,可以提交任务。

现在情况下重启了 2 遍 server,又可以了。

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

image

Are you willing to submit PR?

Code of Conduct

liunaijie commented 1 month ago

can you find any log like this:

pipeline({}) start with savePoint on checkPointId({})

Create CheckpointCoordinator for job({}@{}) with plan({})

Restore job({}@{}) with checkpoint({}), data: {}
sdvdxl commented 1 month ago

有的,如下日志

2024-07-25 13:59:48,030 INFO  [o.a.s.e.s.m.JobMaster         ] [seatunnel-coordinator-service-890] - Init JobMaster for Job sync-smart-ops-sheng-dev-job1 (2000012) 
2024-07-25 13:59:48,030 INFO  [o.a.s.e.s.m.JobMaster         ] [seatunnel-coordinator-service-890] - Job sync-smart-ops-sheng-dev-job1 (2000012) needed jar urls [file:/mnt/sdb1/apache-seatunnel-2.3.5/connectors/connector-starrocks-2.3.5.jar, file:/mnt/sdb1/apache-seatunnel-2.3.5/connectors/connector-cdc-mysql-2.3.5.jar]
2024-07-25 13:59:48,031 INFO  [.c.c.DefaultClassLoaderService] [seatunnel-coordinator-service-890] - Create classloader for job 2000012 with jars [file:/mnt/sdb1/apache-seatunnel-2.3.5/connectors/connector-starrocks-2.3.5.jar, file:/mnt/sdb1/apache-seatunnel-2.3.5/connectors/connector-cdc-mysql-2.3.5.jar]
2024-07-25 13:59:48,084 INFO  [.c.c.DefaultClassLoaderService] [seatunnel-coordinator-service-890] - Release classloader for job 2000012 with jars [file:/mnt/sdb1/apache-seatunnel-2.3.5/connectors/connector-starrocks-2.3.5.jar, file:/mnt/sdb1/apache-seatunnel-2.3.5/connectors/connector-cdc-mysql-2.3.5.jar]
2024-07-25 13:59:48,142 INFO  [o.a.s.e.s.c.CheckpointManager ] [seatunnel-coordinator-service-890] - pipeline(1) start with savePoint on checkPointId(44541)
2024-07-25 13:59:48,144 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-890] - Create CheckpointCoordinator for job(1@2000012) with plan(CheckpointPlan(pipelineId=1, pipelineSubtasks=[TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=360000}, taskID=350000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=240000}, taskID=230000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30007}, taskID=40007, index=7}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30002}, taskID=40002, index=2}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=120000}, taskID=110000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=390000}, taskID=380000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30003}, taskID=40003, index=3}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=330000}, taskID=320000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=210000}, taskID=200000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=1}, taskID=20000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30005}, taskID=40005, index=5}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=90000}, taskID=80000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30000}, taskID=40000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30006}, taskID=40006, index=6}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=300000}, taskID=290000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30001}, taskID=40001, index=1}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=180000}, taskID=170000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=450000}, taskID=440000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=60000}, taskID=50000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30008}, taskID=40008, index=8}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=270000}, taskID=260000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30009}, taskID=40009, index=9}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30004}, taskID=40004, index=4}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=150000}, taskID=140000, index=0}, TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=420000}, taskID=410000, index=0}], startingSubtasks=[TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=1}, taskID=20000, index=0}], pipelineActions={ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.plan_device])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_record_day])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_device_fluency_result])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.plan_info])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.check_item])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_last_task_record])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_device_state_result])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_result])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]])=10, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_info])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC])=10, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_record])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_storage_result])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.performance_rule_config])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.performance_report])=1, ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_pic])=1}, subtaskActions={TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=360000}, taskID=350000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.task_info -> pipeline-1 [Sink[0]-StarRocks-smart_ops.task_info]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_info]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30007}, taskID=40007, index=7}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 7), (ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 7)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=120000}, taskID=110000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_device_state_result]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.task_device_state_result -> pipeline-1 [Sink[0]-StarRocks-smart_ops.task_device_state_result]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=390000}, taskID=380000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.task_record -> pipeline-1 [Sink[0]-StarRocks-smart_ops.task_record]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_record]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30003}, taskID=40003, index=3}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 3), (ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 3)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=330000}, taskID=320000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_record_day]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.task_record_day -> pipeline-1 [Sink[0]-StarRocks-smart_ops.task_record_day]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=90000}, taskID=80000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.plan_info]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.plan_info -> pipeline-1 [Sink[0]-StarRocks-smart_ops.plan_info]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30000}, taskID=40000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=300000}, taskID=290000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_last_task_record]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.device_last_task_record -> pipeline-1 [Sink[0]-StarRocks-smart_ops.device_last_task_record]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30001}, taskID=40001, index=1}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 1), (ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 1)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=450000}, taskID=440000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_device_fluency_result]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.task_device_fluency_result -> pipeline-1 [Sink[0]-StarRocks-smart_ops.task_device_fluency_result]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=60000}, taskID=50000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.plan_device]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.plan_device -> pipeline-1 [Sink[0]-StarRocks-smart_ops.plan_device]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30008}, taskID=40008, index=8}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 8), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 8)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30004}, taskID=40004, index=4}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 4), (ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 4)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=150000}, taskID=140000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_result]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.task_video_result -> pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_result]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=240000}, taskID=230000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.performance_rule_config -> pipeline-1 [Sink[0]-StarRocks-smart_ops.performance_rule_config]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.performance_rule_config]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30002}, taskID=40002, index=2}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 2), (ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 2)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=210000}, taskID=200000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.task_video_storage_result -> pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_storage_result]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_storage_result]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=1}, taskID=20000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), -1)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30005}, taskID=40005, index=5}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 5), (ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 5)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30006}, taskID=40006, index=6}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 6), (ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 6)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=180000}, taskID=170000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.check_item]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.check_item -> pipeline-1 [Sink[0]-StarRocks-smart_ops.check_item]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=270000}, taskID=260000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.performance_report -> pipeline-1 [Sink[0]-StarRocks-smart_ops.performance_report]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.performance_report]), 0)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=30009}, taskID=40009, index=9}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), 9), (ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]]), 9)], TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=2000012, pipelineId=1, taskGroupId=420000}, taskID=410000, index=0}=[(ActionStateKey(name=ActionStateKey - pipeline-1 [Shuffle [Source[0]-MySQL-CDC]] -> smart_ops.task_video_pic -> pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_pic]), 0), (ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.task_video_pic]), 0)]}))
2024-07-25 13:59:48,145 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-890] - Restore job(1@2000012) with checkpoint(44541), data: CompletedCheckpoint(jobId=2000012, pipelineId=1, checkpointId=44541, triggerTimestamp=1721386235795, checkpointType=CHECKPOINT_TYPE, completedTimestamp=1721386235798, taskStates={ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]])=ActionState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), subtaskStates=[ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=0), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=1), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=2), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=3), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=4), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=5), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=6), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=7), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=8), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [TransformChain[Transform[0]-FilterRowKind]]), index=9)], coordinatorState=null, parallelism=10), ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC])=ActionState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), subtaskStates=[ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=0), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=1), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=2), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=3), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=4), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=5), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=6), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=7), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=8), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=9)], coordinatorState=ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Source[0]-MySQL-CDC]), index=-1), parallelism=10), ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record])=ActionState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), subtaskStates=[ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=0), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=1), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=2), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=3), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=4), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=5), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=6), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=7), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=8), ActionSubtaskState(stateKey=ActionStateKey(name=ActionStateKey - pipeline-1 [Sink[0]-StarRocks-smart_ops.device_offline_record]), index=9)], coordinatorState=null, parallelism=10)}, taskStatistics={2=TaskStatistics(jobVertexId=2, subtaskStats=[SubtaskStatistics(subtaskIndex=0, ackTimestamp=1721386235797, stateSize=1, subtaskStatus=RUNNING)], subtaskCompleted=[false], numAcknowledgedSubtasks=1, latestAckedSubtaskStatistics=null), 4=TaskStatistics(jobVertexId=4, subtaskStats=[SubtaskStatistics(subtaskIndex=0, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=1, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=2, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=3, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=4, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=5, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=6, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=7, ackTimestamp=1721386235797, stateSize=1, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=8, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=9, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING)], subtaskCompleted=[false, false, false, false, false, false, false, false, false, false], numAcknowledgedSubtasks=10, latestAckedSubtaskStatistics=null), 5=TaskStatistics(jobVertexId=5, subtaskStats=[SubtaskStatistics(subtaskIndex=0, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=1, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=2, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=3, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=4, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=5, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=6, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=7, ackTimestamp=1721386235798, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=8, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING), SubtaskStatistics(subtaskIndex=9, ackTimestamp=1721386235797, stateSize=0, subtaskStatus=RUNNING)], subtaskCompleted=[false, false, false, false, false, false, false, false, false, false], numAcknowledgedSubtasks=10, latestAckedSubtaskStatistics=null)}, isRestored=true)
2024-07-25 13:59:48,145 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-890] - Turn checkpoint_state_2000012_1 state from null to RUNNING
2024-07-25 13:59:48,146 INFO  [o.a.s.e.s.d.p.PhysicalVertex  ] [seatunnel-coordinator-service-890] - The task Job sync-smart-ops-sheng-dev-job1 (2000012), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-MySQL-CDC]-SourceTask (1/10)] is in state CREATED when init state future
2024-07-25 13:59:48,148 INFO  [o.a.s.e.s.d.p.PhysicalPlan    ] [seatunnel-coordinator-service-890] - Job sync-smart-ops-sheng-dev-job1 (2000012) state process is start
2024-07-25 13:59:48,148 INFO  [o.a.s.e.s.d.p.PhysicalPlan    ] [seatunnel-coordinator-service-890] - Job sync-smart-ops-sheng-dev-job1 (2000012) turned from state CREATED to SCHEDULED.
2024-07-25 13:59:48,148 INFO  [o.a.s.e.s.d.p.SubPlan         ] [seatunnel-coordinator-service-890] - Job sync-smart-ops-sheng-dev-job1 (2000012), Pipeline: [(1/1)] state process is start
2024-07-25 13:59:48,149 INFO  [o.a.s.e.s.d.p.SubPlan         ] [seatunnel-coordinator-service-890] - Job sync-smart-ops-sheng-dev-job1 (2000012), Pipeline: [(1/1)] turned from state CREATED to SCHEDULED.
liunaijie commented 1 month ago

hi, here are some updates, but still not find the root cause

  1. your command is try to restart the job base on savepoint id
  2. base on your log, the job is mysql cdc to starrocks, and will run 21 subtasks. the job has 10 parallelism.
  3. the job is stuck in SCHEDULED status, it's block in apply resource slot phase.

In your config, you use dynamic-slot, this means you can run unlimited job on this node. Now when apply resource, it not set the resource size, like CPU, memory. this means it can apply the slot successful.

image image image image

need run the real case and debug to check the reason.

sdvdxl commented 1 month ago

现在执行取消任务的时候又出现一个 CANCELING 状态,一直无法停止。

步骤

  1. 使用 dolphinscheduler 配置了一个seatunnel 节点,本意是先执行下停止任务(dolphscheduler 有个问题,cluster 模式, 停止调度,seatunnel 无法停止任务),然后再重新启动节点。执行这个节点后无法取消,一直处于这个状态

image

image

liunaijie commented 1 month ago

现在执行取消任务的时候又出现一个 CANCELING 状态,一直无法停止。

步骤

  1. 使用 dolphinscheduler 配置了一个seatunnel 节点,本意是先执行下停止任务(dolphscheduler 有个问题,cluster 模式, 停止调度,seatunnel 无法停止任务),然后再重新启动节点。执行这个节点后无法取消,一直处于这个状态

image

image

我遇到过任务状态卡在CANCELING的状态,但是是由于任务提交后立马进行了取消,此时会有问题,从而造成任务卡在CANCELING的状态,该问题已于2.3.6版本中修复.

sdvdxl commented 1 month ago

刚才执行停止另一个任务 bin/seatunnel.sh -can 30000010005 执行后 20000020001 和 30000010005 都变成 failed 了

13813586515 commented 1 month ago

现在执行取消任务的时候又出现一个 CANCELING 状态,一直无法停止。

步骤

  1. 使用 dolphinscheduler 配置了一个seatunnel 节点,本意是先执行下停止任务(dolphscheduler 有个问题,cluster 模式, 停止调度,seatunnel 无法停止任务),然后再重新启动节点。执行这个节点后无法取消,一直处于这个状态

image

image

大佬,集群模式下,用ds调用任务,即使关闭了任务seatunnelclient进程还是未被kill,这个问题有遇到么

liunaijie commented 1 month ago

现在执行取消任务的时候又出现一个 CANCELING 状态,一直无法停止。

步骤

  1. 使用 dolphinscheduler 配置了一个seatunnel 节点,本意是先执行下停止任务(dolphscheduler 有个问题,cluster 模式, 停止调度,seatunnel 无法停止任务),然后再重新启动节点。执行这个节点后无法取消,一直处于这个状态

image image

大佬,集群模式下,用ds调用任务,即使关闭了任务seatunnelclient进程还是未被kill,这个问题有遇到么

关闭了client后任务的状态是什么

13813586515 commented 4 weeks ago

现在执行取消任务的时候又出现一个 CANCELING 状态,一直无法停止。

步骤

  1. 使用 dolphinscheduler 配置了一个seatunnel 节点,本意是先执行下停止任务(dolphscheduler 有个问题,cluster 模式, 停止调度,seatunnel 无法停止任务),然后再重新启动节点。执行这个节点后无法取消,一直处于这个状态

image image

大佬,集群模式下,用ds调用任务,即使关闭了任务seatunnelclient进程还是未被kill,这个问题有遇到么

关闭了client后任务的状态是什么

即使点了关闭,通过./seatunnel -l 命令查看 仍然处于running状态

liunaijie commented 4 weeks ago

现在执行取消任务的时候又出现一个 CANCELING 状态,一直无法停止。

步骤

  1. 使用 dolphinscheduler 配置了一个seatunnel 节点,本意是先执行下停止任务(dolphscheduler 有个问题,cluster 模式, 停止调度,seatunnel 无法停止任务),然后再重新启动节点。执行这个节点后无法取消,一直处于这个状态

image image

大佬,集群模式下,用ds调用任务,即使关闭了任务seatunnelclient进程还是未被kill,这个问题有遇到么

关闭了client后任务的状态是什么

即使点了关闭,通过./seatunnel -l 命令查看 仍然处于running状态

应该是没有添加类似hook的功能或者没有生效,我没用过不太确定, 你提交的进程仅仅是一个client,提交完成之后任务会跑在服务端,客户端仅去查询状态,有可能没有添加客户端进程停止之后去取消任务的方法或者没有生效,可以去ds那边提个bug看下

sdvdxl commented 3 weeks ago

现在执行取消任务的时候又出现一个 CANCELING 状态,一直无法停止。

步骤

  1. 使用 dolphinscheduler 配置了一个seatunnel 节点,本意是先执行下停止任务(dolphscheduler 有个问题,cluster 模式, 停止调度,seatunnel 无法停止任务),然后再重新启动节点。执行这个节点后无法取消,一直处于这个状态

image image

大佬,集群模式下,用ds调用任务,即使关闭了任务seatunnelclient进程还是未被kill,这个问题有遇到么

遇到了,我这个问题,其实就是想先 kill 掉,然后再运行

sdvdxl commented 3 weeks ago

现在执行取消任务的时候又出现一个 CANCELING 状态,一直无法停止。

步骤

  1. 使用 dolphinscheduler 配置了一个seatunnel 节点,本意是先执行下停止任务(dolphscheduler 有个问题,cluster 模式, 停止调度,seatunnel 无法停止任务),然后再重新启动节点。执行这个节点后无法取消,一直处于这个状态

image image

大佬,集群模式下,用ds调用任务,即使关闭了任务seatunnelclient进程还是未被kill,这个问题有遇到么

关闭了client后任务的状态是什么

即使点了关闭,通过./seatunnel -l 命令查看 仍然处于running状态

应该是没有添加类似hook的功能或者没有生效,我没用过不太确定, 你提交的进程仅仅是一个client,提交完成之后任务会跑在服务端,客户端仅去查询状态,有可能没有添加客户端进程停止之后去取消任务的方法或者没有生效,可以去ds那边提个bug看下

直接命令行运行的话,其实是可以关闭的,应该是dolphin 停止任务,没有把信号发过去