apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.8k stars 4.6k forks source link

[Bug] [check yarn status failed] ds check yarn application status failed,job failed status. #8469

Closed superfenv closed 2 years ago

superfenv commented 2 years ago

Search before asking

What happened

[INFO] 2022-02-21 16:59:51.444 - [taskAppId=TASK-6462-116986-743823]:[445] - find app id: application_1638283119852_49879 [INFO] 2022-02-21 16:59:51.444 - [taskAppId=TASK-6462-116986-743823]:[402] - check yarn application status, appId:application_1638283119852_49879 [DEBUG] 2022-02-21 16:59:51.444 org.apache.dolphinscheduler.common.utils.HadoopUtils:[211] - yarn application url:http://zj1-dipper10-hadoop02.cicc.com:8088/ws/v1/cluster/apps/%s, applicationId:application_1638283119852_49879 [ERROR] 2022-02-21 16:59:51.448 org.apache.dolphinscheduler.common.utils.HttpUtils:[70] - http get:400 response status code is not 200! [ERROR] 2022-02-21 16:59:51.449 - [taskAppId=TASK-6462-116986-743823]:[418] - yarn applications: application_1638283119852_49879 , query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2022-02-21 16:59:51.449 - [taskAppId=TASK-6462-116986-743823]:[238] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/47/6462/116986/743823, processId:4651 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0

What you expected to happen

ds scheduler shell with sqoop job on yarn.

How to reproduce

data.basedir.path=/tmp/dolphinscheduler resource.storage.type=HDFS resource.upload.path=/data/dolphinscheduler hadoop.security.authentication.startup.state=true java.security.krb5.conf.path=/etc/krb5.conf login.user.keytab.username=dolphinscheduler@CICC.COM login.user.keytab.path=/home/dolphinscheduler/dolphinscheduler.keytab kerberos.expire.time=2 hdfs.root.user= fs.defaultFS=hdfs://zj1-dipper10-hadoop02.cicc.com fs.s3a.endpoint= fs.s3a.access.key= fs.s3a.secret.key= resource.manager.httpaddress.port=8088 yarn.application.status.address=http://zj1-dipper10-hadoop02.cicc.com:8088/ws/v1/cluster/apps/%s development.state=false

Anything else

curl http://zj1-dipper10-hadoop02.cicc.com:8088/ws/v1/cluster/apps/application_1638283119852_49879

{"app":{"id":"application_1638283119852_49879","user":"ciccetluser","name":"ODS_ODTSML_EDS_CONTRACTMARGIN","queue":"cicc","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://zj1-dipper10-hadoop02.cicc.com:8088/proxy/application_1638283119852_49879/","diagnostics":"","clusterId":1638283119852,"applicationType":"MAPREDUCE","applicationTags":"","priority":0,"startedTime":1645433963400,"launchTime":1645433963691,"finishedTime":1645433989225,"elapsedTime":25825,"amContainerLogs":"http://zj1-dipper10-hadoop05.cicc.com:8042/node/containerlogs/container_1638283119852_49879_01_000001/ciccetluser","amHostHttpAddress":"zj1-dipper10-hadoop05.cicc.com:8042","amRPCAddress":"zj1-dipper10-hadoop05.cicc.com:45015","allocatedMB":-1,"allocatedVCores":-1,"reservedMB":-1,"reservedVCores":-1,"runningContainers":-1,"memorySeconds":169913,"vcoreSeconds":44,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"resourceSecondsMap":{"entry":{"key":"memory-mb","value":"169913"},"entry":{"key":"vcores","value":"44"}},"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"preemptedMemorySeconds":0,"preemptedVcoreSeconds":0,"preemptedResourceSecondsMap":{},"logAggregationStatus":"SUCCEEDED","unmanagedApplication":false,"amNodeLabelExpression":"","timeouts":{"timeout":[{"type":"LIFETIME","expiryTime":"UNLIMITED","remainingTimeInSeconds":-1}]}}}

Version

1.3.9

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 2 years ago

Hi:

Yao-MR-zz commented 2 years ago

hi @superfenv assume this is not a bug, this is the issue related to your config

your supplied config is as below :

data.basedir.path=/tmp/dolphinscheduler
resource.storage.type=HDFS
resource.upload.path=/data/dolphinscheduler
hadoop.security.authentication.startup.state=true
java.security.krb5.conf.path=/etc/krb5.conf
login.user.keytab.username=[dolphinscheduler@CICC.COM](mailto:dolphinscheduler@CICC.COM)
login.user.keytab.path=/home/dolphinscheduler/dolphinscheduler.keytab
kerberos.expire.time=2
hdfs.root.user=
fs.defaultFS=hdfs://zj1-dipper10-hadoop02.cicc.com
fs.s3a.endpoint=
fs.s3a.access.key=
fs.s3a.secret.key=
resource.manager.httpaddress.port=8088
yarn.application.status.address=http://zj1-dipper10-hadoop02.cicc.com:8088/ws/v1/cluster/apps/%s
development.state=false

the problem is that you config the wrong yarn.application.status.address you should not config the port in your status url,

so the you can find the solution wrong: yarn.application.status.address=http://zj1-dipper10-hadoop02.cicc.com:8088/ws/v1/cluster/apps/%s right: yarn.application.status.address=http://zj1-dipper10-hadoop02.cicc.com:%s/ws/v1/cluster/apps/%s and specify the port in the property resource.manager.httpaddress.port=8088

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 2 years ago

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.