apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.8k stars 4.6k forks source link

[Bug] [Master & Worker] The task killed itself abnormally resulting in failure #10435

Closed shangeyao closed 2 years ago

shangeyao commented 2 years ago

Search before asking

What happened

When I start about hundreds process, I get 2 alert info.

{"id":68,"title":"stop failed","content":"[]","log":null,"warnType":2}
{"id":69,"title":"scheduler failed","content":"[{\"projectCode\":853772144975872,\"projectName\":\"near_realtime\",\"owner\":\"admin\",\"processId\":1160,\"processDefinitionCode\":853779489988608,\"processName\":\"ods_retail_kc_stockorder_detail_sqoop_hour-1-20220613190000925\",\"taskCode\":853779488874496,\"taskName\":\"db2hive(stg_sqoop_retail_kc_stockorder_detail_hour)\",\"taskType\":\"SHELL\",\"taskState\":\"KILL\",\"taskStartTime\":\"2022-06-13 19:00:01\",\"taskEndTime\":\"2022-06-13 19:00:01\",\"taskHost\":\"10.3.7.60:1234\",\"logPath\":\"/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log\"}]","log":null,"warnType":2}

Then I observed the master and worker logs. Master Log

[INFO] 2022-06-13 19:00:00.948 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[203] - handle command 1405 end, create process instance 1160
[INFO] 2022-06-13 19:00:00.999 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[1286] - start submit task : db2hive(stg_sqoop_retail_kc_stockorder_detail_hour), instance id:1160, state: RUNNING_EXECUTION
[INFO] 2022-06-13 19:00:01.002 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[1300] - end submit task to db successfully:2293 db2hive(stg_sqoop_retail_kc_stockorder_detail_hour) state:SUBMITTED_SUCCESS complete, instance id:1160 state: RUNNING_EXECUTION  
[INFO] 2022-06-13 19:00:01.099 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.61:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'}
[INFO] 2022-06-13 19:00:01.102 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.60:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'}
[INFO] 2022-06-13 19:00:01.102 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.62:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'}
[INFO] 2022-06-13 19:00:01.102 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.61:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'}
[INFO] 2022-06-13 19:00:01.110 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.61:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'}
[INFO] 2022-06-13 19:00:01.114 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=6, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:00:01 CST 2022, host=10.3.7.61:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=114874, appIds='', varPool=[]}
[INFO] 2022-06-13 19:00:01.118 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=9, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:00:01 CST 2022, host=10.3.7.61:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=0, appIds='null', varPool=[]}
[INFO] 2022-06-13 19:00:01.122 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.60:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'}
[INFO] 2022-06-13 19:00:01.134 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=6, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:00:01 CST 2022, host=10.3.7.60:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=20981, appIds='', varPool=[]}
[INFO] 2022-06-13 19:00:01.246 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=9, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:00:01 CST 2022, host=10.3.7.60:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=21003, appIds='null', varPool=[]}
[INFO] 2022-06-13 19:00:01.625 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[127] - persist events 1160 succeeded.
[INFO] 2022-06-13 19:00:01.690 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.690 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[448] - work flow 1160 task id:2293 code:853779488874496 state:KILL 
[INFO] 2022-06-13 19:00:01.690 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[1675] - work flow process instance [id: 1160, name:ods_retail_kc_stockorder_detail_sqoop_hour-1-20220613190000925], state change from RUNNING_EXECUTION to FAILURE, cmd type: SCHEDULER
[INFO] 2022-06-13 19:00:01.693 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: FAILURE task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: KILL task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: FAILURE task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: KILL task instance id: 2293 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: PROCESS_STATE_CHANGE executeStatus: FAILURE task instance id: 0 process instance id: 1160 context: null
[INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[680] - process:1160 state FAILURE change to FAILURE
[INFO] 2022-06-13 19:00:01.723 +0800 org.apache.dolphinscheduler.service.alert.ProcessAlertManager:[237] - add alert to db , alert: Alert{id=69, sign='acf0fc20f9dbaf2294d78c959cf7bfc7501f6f09', title='scheduler failed', content='[{"projectCode":853772144975872,"projectName":"near_realtime","owner":"admin","processId":1160,"processDefinitionCode":853779489988608,"processName":"ods_retail_kc_stockorder_detail_sqoop_hour-1-20220613190000925","taskCode":853779488874496,"taskName":"db2hive(stg_sqoop_retail_kc_stockorder_detail_hour)","taskType":"SHELL","taskState":"KILL","taskStartTime":"2022-06-13 19:00:01","taskEndTime":"2022-06-13 19:00:01","taskHost":"10.3.7.60:1234","logPath":"/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log"}]', alertStatus=null, warningType=FAILURE, log='null', alertGroupId=1, createTime=Mon Jun 13 19:00:01 CST 2022, updateTime=null, info={}}
[INFO] 2022-06-13 19:00:01.724 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[131] - process instance 1160 finished.
[INFO] 2022-06-13 19:02:23.209 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=7, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:02:23 CST 2022, host=10.3.7.62:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=5718, appIds='application_1649386217358_1093922', varPool=[]}
[INFO] 2022-06-13 19:06:35.416 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=7, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:02:23 CST 2022, host=10.3.7.62:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=5718, appIds='application_1649386217358_1093922', varPool=[]}`
**Worker Log**
`[INFO] 2022-06-13 19:00:01.096 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[107] - task execute request command : TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":2293,"taskName":"db2hive(stg_sqoop_retail_kc_stockorder_detail_hour)","firstSubmitTime":"2022-06-13 19:00:01","startTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":853779489988608,"processDefineVersion":1,"appIds":null,"processInstanceId":1160,"scheduleTime":"2022-06-13 19:00:00","globalParams":"[{\"prop\":\"day\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"2022-06-13\"},{\"prop\":\"dt\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"20220613\"}]","executorId":3,"cmdTypeIfComplement":6,"tenantCode":"dwadmin","queue":"default","processDefineId":0,"projectId":0,"projectCode":853772144975872,"taskParams":"{\"resourceList\":[],\"localParams\":[],\"rawScript\":\"#!/bin/bash\\nbash /dev/program/env/bin/dolphin_db2hive.sh stg/stg_sqoop_retail_kc_stockorder_detail.db2hive ${day} ${dt}\",\"dependence\":{},\"conditionResult\":{\"successNode\":[\"\"],\"failedNode\":[\"\"]},\"waitStartTimeout\":{\"strategy\":\"FAILED\",\"interval\":null,\"checkInterval\":null,\"enable\":false}}","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","delayTime":0,"currentExecutionStatus":"SUBMITTED_SUCCESS","taskLogName":null,"resourceParametersHelper":null,"endTime":null,"k8sTaskExecutionContext":{"configYaml":null},"resources":{},"varPool":null,"dryRun":0,"paramsMap":null,"dataQualityTaskExecutionContext":{"ruleId":0,"ruleName":null,"ruleType":0,"ruleInputEntryList":null,"executeSqlList":null,"comparisonNeedStatisticsValueTable":false,"compareWithFixedValue":false,"hdfsPath":null,"sourceConnectorType":null,"sourceType":0,"sourceConnectionParams":null,"targetConnectorType":null,"targetType":0,"targetConnectionParams":null,"writerConnectorType":null,"writerType":0,"writerTable":null,"writerConnectionParams":null,"statisticsValueConnectorType":null,"statisticsValueType":0,"statisticsValueTable":null,"statisticsValueWriterConnectionParams":null},"cpuQuota":-1,"memoryMax":-1}'}
[INFO] 2022-06-13 19:00:01.097 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[107] - task execute request command : TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":2293,"taskName":"db2hive(stg_sqoop_retail_kc_stockorder_detail_hour)","firstSubmitTime":"2022-06-13 19:00:01","startTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":853779489988608,"processDefineVersion":1,"appIds":null,"processInstanceId":1160,"scheduleTime":"2022-06-13 19:00:00","globalParams":"[{\"prop\":\"day\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"2022-06-13\"},{\"prop\":\"dt\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"20220613\"}]","executorId":3,"cmdTypeIfComplement":6,"tenantCode":"dwadmin","queue":"default","processDefineId":0,"projectId":0,"projectCode":853772144975872,"taskParams":"{\"resourceList\":[],\"localParams\":[],\"rawScript\":\"#!/bin/bash\\nbash /dev/program/env/bin/dolphin_db2hive.sh stg/stg_sqoop_retail_kc_stockorder_detail.db2hive ${day} ${dt}\",\"dependence\":{},\"conditionResult\":{\"successNode\":[\"\"],\"failedNode\":[\"\"]},\"waitStartTimeout\":{\"strategy\":\"FAILED\",\"interval\":null,\"checkInterval\":null,\"enable\":false}}","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","delayTime":0,"currentExecutionStatus":"SUBMITTED_SUCCESS","taskLogName":null,"resourceParametersHelper":null,"endTime":null,"k8sTaskExecutionContext":{"configYaml":null},"resources":{},"varPool":null,"dryRun":0,"paramsMap":null,"dataQualityTaskExecutionContext":{"ruleId":0,"ruleName":null,"ruleType":0,"ruleInputEntryList":null,"executeSqlList":null,"comparisonNeedStatisticsValueTable":false,"compareWithFixedValue":false,"hdfsPath":null,"sourceConnectorType":null,"sourceType":0,"sourceConnectionParams":null,"targetConnectorType":null,"targetType":0,"targetConnectionParams":null,"writerConnectorType":null,"writerType":0,"writerTable":null,"writerConnectionParams":null,"statisticsValueConnectorType":null,"statisticsValueType":0,"statisticsValueTable":null,"statisticsValueWriterConnectionParams":null},"cpuQuota":-1,"memoryMax":-1}'}
[INFO] 2022-06-13 19:00:01.098 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[154] - task instance local execute path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293
[INFO] 2022-06-13 19:00:01.099 +0800 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293
[INFO] 2022-06-13 19:00:01.099 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[131] - script path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293
[INFO] 2022-06-13 19:00:01.099 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[135] - the task begins to execute. task instance id: 2293
[INFO] 2022-06-13 19:00:01.100 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[154] - task instance local execute path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293
[INFO] 2022-06-13 19:00:01.104 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[139] - task execute path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293
[INFO] 2022-06-13 19:00:01.104 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[85] - tenantCode user:dwadmin, task dir:1160_2293
[INFO] 2022-06-13 19:00:01.104 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[90] - create command file:/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command
/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293_node.sh
[INFO] 2022-06-13 19:00:01.108 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[331] - task run command: sudo -u dwadmin sh /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command
[INFO] 2022-06-13 19:00:01.118 +0800 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293
[INFO] 2022-06-13 19:00:01.119 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[131] - script path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293
[INFO] 2022-06-13 19:00:01.119 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[135] - the task begins to execute. task instance id: 2293
[INFO] 2022-06-13 19:00:01.126 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[139] - task execute path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293
[INFO] 2022-06-13 19:00:01.126 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[246] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId:20981 ,exitStatusCode:127 ,processWaitForStatus:true ,processExitValue:127
[INFO] 2022-06-13 19:00:01.126 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[85] - tenantCode user:dwadmin, task dir:1160_2293
[INFO] 2022-06-13 19:00:01.127 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[90] - create command file:/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command
/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293_node.sh
[INFO] 2022-06-13 19:00:01.130 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[191] - task instance id : 2293,task final status : FAILURE
[INFO] 2022-06-13 19:00:01.130 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[331] - task run command: sudo -u dwadmin sh /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command
[INFO] 2022-06-13 19:00:01.134 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[233] - exec local path: /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 cleared.
[INFO] 2022-06-13 19:00:01.244 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[191] - task instance id : 2293,task final status : KILL
[INFO] 2022-06-13 19:00:01.244 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[233] - exec local path: /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 cleared.
[INFO] 2022-06-13 19:00:01.608 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteRunningAckProcessor:[56] - task execute running ack command : TaskExecuteRunningAckCommand{taskInstanceId=2293, status=7}
[INFO] 2022-06-13 19:00:01.622 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteRunningAckProcessor:[56] - task execute running ack command : TaskExecuteRunningAckCommand{taskInstanceId=2293, status=7}
[INFO] 2022-06-13 19:00:01.625 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteResponseAckProcessor:[56] - task execute response ack command : TaskExecuteResponseAckCommand{taskInstanceId=2293, status=7}
[INFO] 2022-06-13 19:00:01.626 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteResponseAckProcessor:[56] - task execute response ack command : TaskExecuteResponseAckCommand{taskInstanceId=2293, status=7}
        sh: /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command: 没有那个文件或目录
        sh: /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command: 没有那个文件或目录

What you expected to happen

Task can run successfully.

How to reproduce

Run multiple tasks at the same time, about hundreds, and a few tasks will fail to run normally.

Anything else

No response

Version

dev

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 2 years ago

Search before asking

What happened

When I start about hundreds process, I get 2 alert info. {"id":68,"title":"stop failed","content":"[]","log":null,"warnType":2} {"id":69,"title":"scheduler failed","content":"[{\"projectCode\":853772144975872,\"projectName\":\"near_realtime\",\"owner\":\"admin\",\"processId\":1160,\"processDefinitionCode\":853779489988608,\"processName\":\"ods_retail_kc_stockorder_detail_sqoop_hour-1-20220613190000925\",\"taskCode\":853779488874496,\"taskName\":\"db2hive(stg_sqoop_retail_kc_stockorder_detail_hour)\",\"taskType\":\"SHELL\",\"taskState\":\"KILL\",\"taskStartTime\":\"2022-06-13 19:00:01\",\"taskEndTime\":\"2022-06-13 19:00:01\",\"taskHost\":\"10.3.7.60:1234\",\"logPath\":\"/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log\"}]","log":null,"warnType":2} Then I observed the master and worker logs. Master Log [INFO] 2022-06-13 19:00:00.948 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[203] - handle command 1405 end, create process instance 1160 [INFO] 2022-06-13 19:00:00.999 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[1286] - start submit task : db2hive(stg_sqoop_retail_kc_stockorder_detail_hour), instance id:1160, state: RUNNING_EXECUTION [INFO] 2022-06-13 19:00:01.002 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[1300] - end submit task to db successfully:2293 db2hive(stg_sqoop_retail_kc_stockorder_detail_hour) state:SUBMITTED_SUCCESS complete, instance id:1160 state: RUNNING_EXECUTION [INFO] 2022-06-13 19:00:01.099 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.61:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'} [INFO] 2022-06-13 19:00:01.102 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.60:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'} [INFO] 2022-06-13 19:00:01.102 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.62:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'} [INFO] 2022-06-13 19:00:01.102 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.61:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'} [INFO] 2022-06-13 19:00:01.110 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.61:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'} [INFO] 2022-06-13 19:00:01.114 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=6, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:00:01 CST 2022, host=10.3.7.61:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=114874, appIds='', varPool=[]} [INFO] 2022-06-13 19:00:01.118 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=9, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:00:01 CST 2022, host=10.3.7.61:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=0, appIds='null', varPool=[]} [INFO] 2022-06-13 19:00:01.122 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - taskExecuteRunningCommand: TaskExecuteRunningCommand{taskInstanceId=2293, processInstanceId='1160', startTime=Mon Jun 13 19:00:01 CST 2022, host='10.3.7.60:1234', status=1, logPath='/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log', executePath='/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293', processId=0', appIds='null'} [INFO] 2022-06-13 19:00:01.134 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=6, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:00:01 CST 2022, host=10.3.7.60:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=20981, appIds='', varPool=[]} [INFO] 2022-06-13 19:00:01.246 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=9, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:00:01 CST 2022, host=10.3.7.60:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=21003, appIds='null', varPool=[]} [INFO] 2022-06-13 19:00:01.625 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[127] - persist events 1160 succeeded. [INFO] 2022-06-13 19:00:01.690 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.690 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[448] - work flow 1160 task id:2293 code:853779488874496 state:KILL [INFO] 2022-06-13 19:00:01.690 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[1675] - work flow process instance [id: 1160, name:ods_retail_kc_stockorder_detail_sqoop_hour-1-20220613190000925], state change from RUNNING_EXECUTION to FAILURE, cmd type: SCHEDULER [INFO] 2022-06-13 19:00:01.693 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: FAILURE task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.694 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: KILL task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: FAILURE task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: KILL task instance id: 2293 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[314] - process event: State Event :key: null type: PROCESS_STATE_CHANGE executeStatus: FAILURE task instance id: 0 process instance id: 1160 context: null [INFO] 2022-06-13 19:00:01.695 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[680] - process:1160 state FAILURE change to FAILURE [INFO] 2022-06-13 19:00:01.723 +0800 org.apache.dolphinscheduler.service.alert.ProcessAlertManager:[237] - add alert to db , alert: Alert{id=69, sign='acf0fc20f9dbaf2294d78c959cf7bfc7501f6f09', title='scheduler failed', content='[{"projectCode":853772144975872,"projectName":"near_realtime","owner":"admin","processId":1160,"processDefinitionCode":853779489988608,"processName":"ods_retail_kc_stockorder_detail_sqoop_hour-1-20220613190000925","taskCode":853779488874496,"taskName":"db2hive(stg_sqoop_retail_kc_stockorder_detail_hour)","taskType":"SHELL","taskState":"KILL","taskStartTime":"2022-06-13 19:00:01","taskEndTime":"2022-06-13 19:00:01","taskHost":"10.3.7.60:1234","logPath":"/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log"}]', alertStatus=null, warningType=FAILURE, log='null', alertGroupId=1, createTime=Mon Jun 13 19:00:01 CST 2022, updateTime=null, info={}} [INFO] 2022-06-13 19:00:01.724 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[131] - process instance 1160 finished. [INFO] 2022-06-13 19:02:23.209 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=7, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:02:23 CST 2022, host=10.3.7.62:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=5718, appIds='application_1649386217358_1093922', varPool=[]} [INFO] 2022-06-13 19:06:35.416 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[60] - received command : TaskExecuteResponseCommand{taskInstanceId=2293, processInstanceId=1160, status=7, startTime=Mon Jun 13 19:00:01 CST 2022, endTime=Mon Jun 13 19:02:23 CST 2022, host=10.3.7.62:1234, logPath=/opt/soft/dolphinscheduler/worker-server/logs/20220613/853779489988608_1-1160-2293.log, executePath=/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId=5718, appIds='application_1649386217358_1093922', varPool=[]} Worker Log [INFO] 2022-06-13 19:00:01.096 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[107] - task execute request command : TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":2293,"taskName":"db2hive(stg_sqoop_retail_kc_stockorder_detail_hour)","firstSubmitTime":"2022-06-13 19:00:01","startTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":853779489988608,"processDefineVersion":1,"appIds":null,"processInstanceId":1160,"scheduleTime":"2022-06-13 19:00:00","globalParams":"[{\"prop\":\"day\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"2022-06-13\"},{\"prop\":\"dt\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"20220613\"}]","executorId":3,"cmdTypeIfComplement":6,"tenantCode":"dwadmin","queue":"default","processDefineId":0,"projectId":0,"projectCode":853772144975872,"taskParams":"{\"resourceList\":[],\"localParams\":[],\"rawScript\":\"#!/bin/bash\\nbash /dev/program/env/bin/dolphin_db2hive.sh stg/stg_sqoop_retail_kc_stockorder_detail.db2hive ${day} ${dt}\",\"dependence\":{},\"conditionResult\":{\"successNode\":[\"\"],\"failedNode\":[\"\"]},\"waitStartTimeout\":{\"strategy\":\"FAILED\",\"interval\":null,\"checkInterval\":null,\"enable\":false}}","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","delayTime":0,"currentExecutionStatus":"SUBMITTED_SUCCESS","taskLogName":null,"resourceParametersHelper":null,"endTime":null,"k8sTaskExecutionContext":{"configYaml":null},"resources":{},"varPool":null,"dryRun":0,"paramsMap":null,"dataQualityTaskExecutionContext":{"ruleId":0,"ruleName":null,"ruleType":0,"ruleInputEntryList":null,"executeSqlList":null,"comparisonNeedStatisticsValueTable":false,"compareWithFixedValue":false,"hdfsPath":null,"sourceConnectorType":null,"sourceType":0,"sourceConnectionParams":null,"targetConnectorType":null,"targetType":0,"targetConnectionParams":null,"writerConnectorType":null,"writerType":0,"writerTable":null,"writerConnectionParams":null,"statisticsValueConnectorType":null,"statisticsValueType":0,"statisticsValueTable":null,"statisticsValueWriterConnectionParams":null},"cpuQuota":-1,"memoryMax":-1}'} [INFO] 2022-06-13 19:00:01.097 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[107] - task execute request command : TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":2293,"taskName":"db2hive(stg_sqoop_retail_kc_stockorder_detail_hour)","firstSubmitTime":"2022-06-13 19:00:01","startTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":853779489988608,"processDefineVersion":1,"appIds":null,"processInstanceId":1160,"scheduleTime":"2022-06-13 19:00:00","globalParams":"[{\"prop\":\"day\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"2022-06-13\"},{\"prop\":\"dt\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"20220613\"}]","executorId":3,"cmdTypeIfComplement":6,"tenantCode":"dwadmin","queue":"default","processDefineId":0,"projectId":0,"projectCode":853772144975872,"taskParams":"{\"resourceList\":[],\"localParams\":[],\"rawScript\":\"#!/bin/bash\\nbash /dev/program/env/bin/dolphin_db2hive.sh stg/stg_sqoop_retail_kc_stockorder_detail.db2hive ${day} ${dt}\",\"dependence\":{},\"conditionResult\":{\"successNode\":[\"\"],\"failedNode\":[\"\"]},\"waitStartTimeout\":{\"strategy\":\"FAILED\",\"interval\":null,\"checkInterval\":null,\"enable\":false}}","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","delayTime":0,"currentExecutionStatus":"SUBMITTED_SUCCESS","taskLogName":null,"resourceParametersHelper":null,"endTime":null,"k8sTaskExecutionContext":{"configYaml":null},"resources":{},"varPool":null,"dryRun":0,"paramsMap":null,"dataQualityTaskExecutionContext":{"ruleId":0,"ruleName":null,"ruleType":0,"ruleInputEntryList":null,"executeSqlList":null,"comparisonNeedStatisticsValueTable":false,"compareWithFixedValue":false,"hdfsPath":null,"sourceConnectorType":null,"sourceType":0,"sourceConnectionParams":null,"targetConnectorType":null,"targetType":0,"targetConnectionParams":null,"writerConnectorType":null,"writerType":0,"writerTable":null,"writerConnectionParams":null,"statisticsValueConnectorType":null,"statisticsValueType":0,"statisticsValueTable":null,"statisticsValueWriterConnectionParams":null},"cpuQuota":-1,"memoryMax":-1}'} [INFO] 2022-06-13 19:00:01.098 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[154] - task instance local execute path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 [INFO] 2022-06-13 19:00:01.099 +0800 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 [INFO] 2022-06-13 19:00:01.099 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[131] - script path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 [INFO] 2022-06-13 19:00:01.099 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[135] - the task begins to execute. task instance id: 2293 [INFO] 2022-06-13 19:00:01.100 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[154] - task instance local execute path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 [INFO] 2022-06-13 19:00:01.104 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[139] - task execute path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 [INFO] 2022-06-13 19:00:01.104 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[85] - tenantCode user:dwadmin, task dir:1160_2293 [INFO] 2022-06-13 19:00:01.104 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[90] - create command file:/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293_node.sh [INFO] 2022-06-13 19:00:01.108 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[331] - task run command: sudo -u dwadmin sh /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command [INFO] 2022-06-13 19:00:01.118 +0800 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 [INFO] 2022-06-13 19:00:01.119 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[131] - script path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 [INFO] 2022-06-13 19:00:01.119 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[135] - the task begins to execute. task instance id: 2293 [INFO] 2022-06-13 19:00:01.126 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[139] - task execute path : /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 [INFO] 2022-06-13 19:00:01.126 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[246] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293, processId:20981 ,exitStatusCode:127 ,processWaitForStatus:true ,processExitValue:127 [INFO] 2022-06-13 19:00:01.126 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[85] - tenantCode user:dwadmin, task dir:1160_2293 [INFO] 2022-06-13 19:00:01.127 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[90] - create command file:/tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293_node.sh [INFO] 2022-06-13 19:00:01.130 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[191] - task instance id : 2293,task final status : FAILURE [INFO] 2022-06-13 19:00:01.130 +0800 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[331] - task run command: sudo -u dwadmin sh /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command [INFO] 2022-06-13 19:00:01.134 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[233] - exec local path: /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 cleared. [INFO] 2022-06-13 19:00:01.244 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[191] - task instance id : 2293,task final status : KILL [INFO] 2022-06-13 19:00:01.244 +0800 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[233] - exec local path: /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293 cleared. [INFO] 2022-06-13 19:00:01.608 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteRunningAckProcessor:[56] - task execute running ack command : TaskExecuteRunningAckCommand{taskInstanceId=2293, status=7} [INFO] 2022-06-13 19:00:01.622 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteRunningAckProcessor:[56] - task execute running ack command : TaskExecuteRunningAckCommand{taskInstanceId=2293, status=7} [INFO] 2022-06-13 19:00:01.625 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteResponseAckProcessor:[56] - task execute response ack command : TaskExecuteResponseAckCommand{taskInstanceId=2293, status=7} [INFO] 2022-06-13 19:00:01.626 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteResponseAckProcessor:[56] - task execute response ack command : TaskExecuteResponseAckCommand{taskInstanceId=2293, status=7} sh: /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command: 没有那个文件或目录 sh: /tmp/dolphinscheduler/exec/process/853772144975872/853779489988608_1/1160/2293/1160_2293.command: 没有那个文件或目录

What you expected to happen

Task can run successfully.

How to reproduce

Run multiple tasks at the same time, about hundreds, and a few tasks will fail to run normally.

Anything else

No response

Version

dev

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 2 years ago

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.

caishunfeng commented 2 years ago

It seems the master sent the 2293 task twice and cause this task to fail

ruanwenjun commented 2 years ago

@shangeyao Please check again, this should be fixed by #10479 at dev. And I will close this issue, if there are still has problem, we can open this issue.