apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.85k stars 4.62k forks source link

[Bug] [REMOTESHELL] The remote shell execution was successful, but the status of the task instance did fail. #16376

Closed FN20200222 closed 2 weeks ago

FN20200222 commented 3 months ago

Search before asking

What happened

远程shell执行成功了,但是任务实例的状态确实失败的

What you expected to happen

远程shell执行成功了,但是任务实例的状态确实失败的

How to reproduce

1721962550597

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 3 months ago

Search before asking

What happened

The remote shell execution was successful, but the status of the task instance did fail.

What you expected to happen

The remote shell execution was successful, but the status of the task instance did fail.

How to reproduce

1721962550597

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

Code of Conduct

SbloodyS commented 3 months ago

Please provide more logs. @FN20200222

starrysxy commented 3 months ago

I saw there has an echo ... after mkdir a1. Maybe you need add set -e in your shell script after #!/bin/bash.

#!/bin/bash

set -e

# something else

If you do this, it will return a non-zero exit status when there is something wrong in your shell immediately. So, you can get a right return of your shell script.

wangxj3 commented 3 months ago

I saw there has an echo ... after mkdir a1. Maybe you need add set -e in your shell script after #!/bin/bash.

#!/bin/bash

set -e

# something else

If you do this, it will return a non-zero exit status when there is something wrong in your shell immediately. So, you can get a right return of your shell script.

Perhaps this is not the solution to this issue. According to the log, the resolved status should be 0, but the log prints Remote shell task failed without adding status to the log.

zhangconan commented 2 months ago

我也遇到了类似的问题,报的错是: [INFO] 2024-09-04 11:37:24.980 +0800 - The final script is:

!/bin/bash

echo 'zhangsan'; echo DOLPHINSCHEDULER-REMOTE-SHELL-TASK-STATUS-$? [INFO] 2024-09-04 11:37:25.056 +0800 - Remote shell task log: [INFO] 2024-09-04 11:37:25.250 +0800 - zhangsan DOLPHINSCHEDULER-REMOTE-SHELL-TASK-STATUS-0

[INFO] 2024-09-04 11:37:25.324 +0800 - Remote shell task run status: DOLPHINSCHEDULER-REMOTE-SHELL-TASK-STATUS-0

[ERROR] 2024-09-04 11:37:25.325 +0800 - Remote shell task failed [ERROR] 2024-09-04 11:37:25.332 +0800 - shell task error org.apache.dolphinscheduler.plugin.task.api.TaskException: Remote shell task error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:100) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:104) at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerTaskExecutor.executeTask(DefaultWorkerTaskExecutor.java:51) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.run(WorkerTaskExecutor.java:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.NumberFormatException: For input string: "0 " at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskExitCode(RemoteExecutor.java:140) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:98) ... 6 common frames omitted [ERROR] 2024-09-04 11:37:25.333 +0800 - Task execute failed, due to meet an exception org.apache.dolphinscheduler.plugin.task.api.TaskException: Execute shell task error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:110) at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerTaskExecutor.executeTask(DefaultWorkerTaskExecutor.java:51) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.run(WorkerTaskExecutor.java:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: Remote shell task error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:100) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:104) ... 5 common frames omitted Caused by: java.lang.NumberFormatException: For input string: "0 " at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskExitCode(RemoteExecutor.java:140) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:98) ... 6 common frames omitted [INFO] 2024-09-04 11:37:25.334 +0800 - kill remote task dolphinscheduler-remoteshell-72 [ERROR] 2024-09-04 11:37:25.335 +0800 - Cancel task failed, this will not affect the taskInstance status, but you need to check manual org.apache.dolphinscheduler.plugin.task.api.TaskException: cancel application error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.cancel(RemoteShellTask.java:121) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.cancelTask(WorkerTaskExecutor.java:133) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.afterThrowing(WorkerTaskExecutor.java:114) at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerTaskExecutor.afterThrowing(DefaultWorkerTaskExecutor.java:61) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.run(WorkerTaskExecutor.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: SSH connection failed at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:82) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.runRemote(RemoteExecutor.java:224) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskPid(RemoteExecutor.java:200) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.kill(RemoteExecutor.java:158) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.cancel(RemoteShellTask.java:119) ... 7 common frames omitted Caused by: java.lang.IllegalStateException: SshClient not started. Please call start() method before connecting to a server at org.apache.sshd.client.SshClient.doConnect(SshClient.java:627) at org.apache.sshd.client.SshClient.doConnect(SshClient.java:616) at org.apache.sshd.client.SshClient.connect(SshClient.java:547) at org.apache.sshd.client.SshClient.connect(SshClient.java:539) at org.apache.sshd.client.session.ClientSessionCreator.connect(ClientSessionCreator.java:74) at org.apache.sshd.client.session.ClientSessionCreator.connect(ClientSessionCreator.java:57) at org.apache.dolphinscheduler.plugin.datasource.ssh.SSHUtils.getSession(SSHUtils.java:41) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:77) ... 11 common frames omitted [INFO] 2024-09-04 11:37:25.339 +0800 - Get a exception when execute the task, will send the task status: FAILURE to master: 172.18.0.1:1234 [INFO] 2024-09-04 11:37:25.339 +0800 - FINALIZE_SESSION

zhangconan commented 2 months ago

我也遇到了类似的问题,报的错是: [INFO] 2024-09-04 11:37:24.980 +0800 - The final script is: #!/bin/bash echo 'zhangsan'; echo DOLPHINSCHEDULER-REMOTE-SHELL-TASK-STATUS-$? [INFO] 2024-09-04 11:37:25.056 +0800 - Remote shell task log: [INFO] 2024-09-04 11:37:25.250 +0800 - zhangsan DOLPHINSCHEDULER-REMOTE-SHELL-TASK-STATUS-0

[INFO] 2024-09-04 11:37:25.324 +0800 - Remote shell task run status: DOLPHINSCHEDULER-REMOTE-SHELL-TASK-STATUS-0

[ERROR] 2024-09-04 11:37:25.325 +0800 - Remote shell task failed [ERROR] 2024-09-04 11:37:25.332 +0800 - shell task error org.apache.dolphinscheduler.plugin.task.api.TaskException: Remote shell task error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:100) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:104) at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerTaskExecutor.executeTask(DefaultWorkerTaskExecutor.java:51) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.run(WorkerTaskExecutor.java:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.NumberFormatException: For input string: "0 " at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskExitCode(RemoteExecutor.java:140) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:98) ... 6 common frames omitted [ERROR] 2024-09-04 11:37:25.333 +0800 - Task execute failed, due to meet an exception org.apache.dolphinscheduler.plugin.task.api.TaskException: Execute shell task error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:110) at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerTaskExecutor.executeTask(DefaultWorkerTaskExecutor.java:51) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.run(WorkerTaskExecutor.java:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: Remote shell task error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:100) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.handle(RemoteShellTask.java:104) ... 5 common frames omitted Caused by: java.lang.NumberFormatException: For input string: "0 " at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskExitCode(RemoteExecutor.java:140) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.run(RemoteExecutor.java:98) ... 6 common frames omitted [INFO] 2024-09-04 11:37:25.334 +0800 - kill remote task dolphinscheduler-remoteshell-72 [ERROR] 2024-09-04 11:37:25.335 +0800 - Cancel task failed, this will not affect the taskInstance status, but you need to check manual org.apache.dolphinscheduler.plugin.task.api.TaskException: cancel application error at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.cancel(RemoteShellTask.java:121) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.cancelTask(WorkerTaskExecutor.java:133) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.afterThrowing(WorkerTaskExecutor.java:114) at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerTaskExecutor.afterThrowing(DefaultWorkerTaskExecutor.java:61) at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecutor.run(WorkerTaskExecutor.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.dolphinscheduler.plugin.task.api.TaskException: SSH connection failed at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:82) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.runRemote(RemoteExecutor.java:224) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getTaskPid(RemoteExecutor.java:200) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.kill(RemoteExecutor.java:158) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteShellTask.cancel(RemoteShellTask.java:119) ... 7 common frames omitted Caused by: java.lang.IllegalStateException: SshClient not started. Please call start() method before connecting to a server at org.apache.sshd.client.SshClient.doConnect(SshClient.java:627) at org.apache.sshd.client.SshClient.doConnect(SshClient.java:616) at org.apache.sshd.client.SshClient.connect(SshClient.java:547) at org.apache.sshd.client.SshClient.connect(SshClient.java:539) at org.apache.sshd.client.session.ClientSessionCreator.connect(ClientSessionCreator.java:74) at org.apache.sshd.client.session.ClientSessionCreator.connect(ClientSessionCreator.java:57) at org.apache.dolphinscheduler.plugin.datasource.ssh.SSHUtils.getSession(SSHUtils.java:41) at org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor.getSession(RemoteExecutor.java:77) ... 11 common frames omitted [INFO] 2024-09-04 11:37:25.339 +0800 - Get a exception when execute the task, will send the task status: FAILURE to master: 172.18.0.1:1234 [INFO] 2024-09-04 11:37:25.339 +0800 - FINALIZE_SESSION

This is a bug in 3.2.2. An error occurred while converting the status when getting the taskExitCode.

image

This issue has been resolved in the dev branch.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 2 weeks ago

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.