Closed KingSpring closed 2 years ago
when I run a shell for testing mapreduce in ds (fee image below),ds web log shows : yarn status get failed.
shell content :
hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10
[INFO] 2021-10-26 10:34:28.745 - [taskAppId=TASK-1-6-89]:[115] - create dir success /exec/process/1/1/6/89
[INFO] 2021-10-26 10:34:28.754 - [taskAppId=TASK-1-6-89]:[88] - shell task params {"rawScript":"hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10","localParams":[],"resourceList":[]}
[INFO] 2021-10-26 10:34:28.758 - [taskAppId=TASK-1-6-89]:[154] - raw script : hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10
[INFO] 2021-10-26 10:34:28.759 - [taskAppId=TASK-1-6-89]:[155] - task execute path : /exec/process/1/1/6/89
[INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[87] - tenantCode user:root, task dir:1_6_89
[INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[92] - create command file:/exec/process/1/1/6/89/1_6_89.command
[INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[111] - command : #!/bin/sh
BASEDIR=$(cd dirname $0
; pwd)
cd $BASEDIR
source /opt/app/dolphinscheduler/conf/env/dolphinscheduler_env.sh
/exec/process/1/1/6/89/1_6_89_node.sh
[INFO] 2021-10-26 10:34:28.764 - [taskAppId=TASK-1-6-89]:[330] - task run command:
sudo -u root sh /exec/process/1/1/6/89/1_6_89.command
[INFO] 2021-10-26 10:34:28.773 - [taskAppId=TASK-1-6-89]:[211] - process start, process id is: 19627
[INFO] 2021-10-26 10:34:29.774 - [taskAppId=TASK-1-6-89]:[138] - -> SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/app/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/app/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Number of Maps = 10
Samples per Map = 10
[INFO] 2021-10-26 10:34:31.775 - [taskAppId=TASK-1-6-89]:[138] - -> Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
21/10/26 10:34:31 INFO client.RMProxy: Connecting to ResourceManager at hadoop47/192.168.80.47:8032
[INFO] 2021-10-26 10:34:32.776 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:32 INFO input.FileInputFormat: Total input files to process : 10
21/10/26 10:34:32 INFO mapreduce.JobSubmitter: number of splits:10
21/10/26 10:34:32 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
21/10/26 10:34:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1634958933716_0113
21/10/26 10:34:32 INFO impl.YarnClientImpl: Submitted application application_1634958933716_0113
21/10/26 10:34:32 INFO mapreduce.Job: The url to track the job: http://hadoop47:8088/proxy/application_1634958933716_0113/
21/10/26 10:34:32 INFO mapreduce.Job: Running job: job_1634958933716_0113
[INFO] 2021-10-26 10:34:40.785 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:39 INFO mapreduce.Job: Job job_1634958933716_0113 running in uber mode : false
21/10/26 10:34:39 INFO mapreduce.Job: map 0% reduce 0%
[INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0%
[INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0%
[INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113
[INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113
[ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{}
java.lang.NullPointerException: null
at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230)
at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101)
at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0
[INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100%
21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully
21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=226
FILE: Number of bytes written=2205654
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2630
HDFS: Number of bytes written=215
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=149819
Total time spent by all reduces in occupied slots (ms)=3113
Total time spent by all map tasks (ms)=149819
Total time spent by all reduce tasks (ms)=3113
Total vcore-milliseconds taken by all map tasks=149819
Total vcore-milliseconds taken by all reduce tasks=3113
Total megabyte-milliseconds taken by all map tasks=153414656
Total megabyte-milliseconds taken by all reduce tasks=3187712
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=280
Input split bytes=1450
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=6825
CPU time spent (ms)=4980
Physical memory (bytes) snapshot=3529900032
Virtual memory (bytes) snapshot=22377988096
Total committed heap usage (bytes)=2413297664
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 30.695 seconds
Estimated value of Pi is 3.20000000000000000000
[DEBUG] 2021-10-26 10:34:56.708 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400c9, packet:: clientPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 serverPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 finished:false header:: 2933,4 replyHeader:: 2933,17180717039,0 request:: '/dolphinscheduler/nodes/worker/default/192.168.80.49:1234,T response:: #302e332c302e39312c302e35392c312e33372c382e302c302e332c323032312d31302d32362030393a32373a30362c323032312d31302d32362031303a33343a35362c302c34303937,s{17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701} [DEBUG] 2021-10-26 10:34:56.708 org.apache.dolphinscheduler.service.zk.ZookeeperCachedOperator:[62] - zookeeperListener:org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$WorkerGroupNodeListener triggered [DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[396] - processResult: CuratorEventImpl{type=GET_DATA, resultCode=0, path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', name='null', children=null, context=null, stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701 , data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55], watchedEvent=null, aclList=null, opResults=null} [DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[857] - publishEvent: TreeCacheEvent{type=NODE_UPDATED, data=ChildData{path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701 , data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55]}} [INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0% [INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0% [DEBUG] 2021-10-26 10:34:58.313 org.apache.zookeeper.ClientCnxn:[745] - Got ping response for sessionid: 0x30015c0a38d009d after 0ms [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113 [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113 [DEBUG] 2021-10-26 10:35:02.715 org.apache.dolphinscheduler.common.utils.HadoopUtils:[211] - yarn application url:http://hadoop47:%s/ws/v1/cluster/apps/%s, applicationId:application_1634958933716_0113 [ERROR] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.common.utils.HttpUtils:[73] - Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused) org.apache.http.conn.HttpHostConnectException: Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.dolphinscheduler.common.utils.HttpUtils.get(HttpUtils.java:60) at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:420) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394) at java.net.Socket.connect(Socket.java:606) at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134) ... 20 common frames omitted [ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0 [INFO] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[147] - task instance id : 89,task final status : FAILURE [INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[185] - develop mode is: false [INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[203] - exec local path: /exec/process/1/1/6/89 cleared. [INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100% 21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully 21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=2205654 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2630 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=149819 Total time spent by all reduces in occupied slots (ms)=3113 Total time spent by all map tasks (ms)=149819 Total time spent by all reduce tasks (ms)=3113 Total vcore-milliseconds taken by all map tasks=149819 Total vcore-milliseconds taken by all reduce tasks=3113 Total megabyte-milliseconds taken by all map tasks=153414656 Total megabyte-milliseconds taken by all reduce tasks=3187712 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1450 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=6825 CPU time spent (ms)=4980 Physical memory (bytes) snapshot=3529900032 Virtual memory (bytes) snapshot=22377988096 Total committed heap usage (bytes)=2413297664 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 30.695 seconds Estimated value of Pi is 3.20000000000000000000
Yarn application_1634958933716_0113 status can always be get;
Server: KunPeng OS centos7 DS release:1.3.9 Hadoop version :2.9.2 Yarn Ha: False conf/common.properties
resource.manager.httpaddress.port=
yarn.resourcemanager.ha.rm.ids=
yarn.application.status.address=http://hadoop47:%s/ws/v1/cluster/apps/%s
some times fail ; high probability of this error
Hi:
I write log in code org.apache.dolphinscheduler.common.utils.HadoopUtils#getApplicationUrl
logger.info("yarn application url:{}", String.format(appUrl, activeResourceManagerPort, applicationId));
the excetption in log is :
----------------log begin----------------------------
[INFO] 2021-10-26 16:46:26.257 - [taskAppId=TASK-1-6-91]:[402] - check yarn application status, appId:application_1634958933716_0116
[INFO] 2021-10-26 16:46:26.300 org.apache.dolphinscheduler.common.utils.PropertyUtils:[140] - For input string: ""
java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.dolphinscheduler.common.utils.PropertyUtils.getInt(PropertyUtils.java:138)
at org.apache.dolphinscheduler.common.utils.HadoopUtils.
then I set conf/common.propertes ####### resourcemanager port, the default value is 8088 if not specified^M resource.manager.httpaddress.port=8088
then DS workflow is OK , but I don't know why; I don't know my config is set wrong or DS code has bugs ?
image shows when resource.manager.httpaddress.port in commen.properties is empty, dolphin can't get the right connection to yarn , it only connect to the rm ip with no port in the yarn url. I think this is a bug ,as least the document shold update to avoid this problem.
@lenboo What do you think? It's same as we met yesterday in wechat discuss?
你们自己测试都不用这个功能吗?感觉都是一些很低级且必现的错误,尤其是版本升级过来的时候,旧版本没有这个参数,新版本默认可以不加这个参数
这是来自QQ邮箱的假期自动回复邮件。 您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
Search before asking
What happened
operate
when I run a shell for testing mapreduce in ds (fee image below),ds web log shows : yarn status get failed. shell content :
hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10
ds web log
[INFO] 2021-10-26 10:34:28.745 - [taskAppId=TASK-1-6-89]:[115] - create dir success /exec/process/1/1/6/89 [INFO] 2021-10-26 10:34:28.754 - [taskAppId=TASK-1-6-89]:[88] - shell task params {"rawScript":"hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10","localParams":[],"resourceList":[]} [INFO] 2021-10-26 10:34:28.758 - [taskAppId=TASK-1-6-89]:[154] - raw script : hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10 [INFO] 2021-10-26 10:34:28.759 - [taskAppId=TASK-1-6-89]:[155] - task execute path : /exec/process/1/1/6/89 [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[87] - tenantCode user:root, task dir:1_6_89 [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[92] - create command file:/exec/process/1/1/6/89/1_6_89.command [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[111] - command : #!/bin/sh BASEDIR=$(cd
dirname $0
; pwd) cd $BASEDIR source /opt/app/dolphinscheduler/conf/env/dolphinscheduler_env.sh /exec/process/1/1/6/89/1_6_89_node.sh [INFO] 2021-10-26 10:34:28.764 - [taskAppId=TASK-1-6-89]:[330] - task run command: sudo -u root sh /exec/process/1/1/6/89/1_6_89.command [INFO] 2021-10-26 10:34:28.773 - [taskAppId=TASK-1-6-89]:[211] - process start, process id is: 19627 [INFO] 2021-10-26 10:34:29.774 - [taskAppId=TASK-1-6-89]:[138] - -> SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/app/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/app/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Number of Maps = 10 Samples per Map = 10 [INFO] 2021-10-26 10:34:31.775 - [taskAppId=TASK-1-6-89]:[138] - -> Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 21/10/26 10:34:31 INFO client.RMProxy: Connecting to ResourceManager at hadoop47/192.168.80.47:8032 [INFO] 2021-10-26 10:34:32.776 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:32 INFO input.FileInputFormat: Total input files to process : 10 21/10/26 10:34:32 INFO mapreduce.JobSubmitter: number of splits:10 21/10/26 10:34:32 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 21/10/26 10:34:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1634958933716_0113 21/10/26 10:34:32 INFO impl.YarnClientImpl: Submitted application application_1634958933716_0113 21/10/26 10:34:32 INFO mapreduce.Job: The url to track the job: http://hadoop47:8088/proxy/application_1634958933716_0113/ 21/10/26 10:34:32 INFO mapreduce.Job: Running job: job_1634958933716_0113 [INFO] 2021-10-26 10:34:40.785 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:39 INFO mapreduce.Job: Job job_1634958933716_0113 running in uber mode : false 21/10/26 10:34:39 INFO mapreduce.Job: map 0% reduce 0% [INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0% [INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0% [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113 [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113 [ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0 [INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100% 21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully 21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=2205654 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2630 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=149819 Total time spent by all reduces in occupied slots (ms)=3113 Total time spent by all map tasks (ms)=149819 Total time spent by all reduce tasks (ms)=3113 Total vcore-milliseconds taken by all map tasks=149819 Total vcore-milliseconds taken by all reduce tasks=3113 Total megabyte-milliseconds taken by all map tasks=153414656 Total megabyte-milliseconds taken by all reduce tasks=3187712 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1450 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=6825 CPU time spent (ms)=4980 Physical memory (bytes) snapshot=3529900032 Virtual memory (bytes) snapshot=22377988096 Total committed heap usage (bytes)=2413297664 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 30.695 seconds Estimated value of Pi is 3.20000000000000000000worker debug log
[DEBUG] 2021-10-26 10:34:56.708 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400c9, packet:: clientPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 serverPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 finished:false header:: 2933,4 replyHeader:: 2933,17180717039,0 request:: '/dolphinscheduler/nodes/worker/default/192.168.80.49:1234,T response:: #302e332c302e39312c302e35392c312e33372c382e302c302e332c323032312d31302d32362030393a32373a30362c323032312d31302d32362031303a33343a35362c302c34303937,s{17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701} [DEBUG] 2021-10-26 10:34:56.708 org.apache.dolphinscheduler.service.zk.ZookeeperCachedOperator:[62] - zookeeperListener:org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$WorkerGroupNodeListener triggered [DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[396] - processResult: CuratorEventImpl{type=GET_DATA, resultCode=0, path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', name='null', children=null, context=null, stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701 , data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55], watchedEvent=null, aclList=null, opResults=null} [DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[857] - publishEvent: TreeCacheEvent{type=NODE_UPDATED, data=ChildData{path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701 , data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55]}} [INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0% [INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0% [DEBUG] 2021-10-26 10:34:58.313 org.apache.zookeeper.ClientCnxn:[745] - Got ping response for sessionid: 0x30015c0a38d009d after 0ms [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113 [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113 [DEBUG] 2021-10-26 10:35:02.715 org.apache.dolphinscheduler.common.utils.HadoopUtils:[211] - yarn application url:http://hadoop47:%s/ws/v1/cluster/apps/%s, applicationId:application_1634958933716_0113 [ERROR] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.common.utils.HttpUtils:[73] - Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused) org.apache.http.conn.HttpHostConnectException: Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.dolphinscheduler.common.utils.HttpUtils.get(HttpUtils.java:60) at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:420) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394) at java.net.Socket.connect(Socket.java:606) at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134) ... 20 common frames omitted [ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0 [INFO] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[147] - task instance id : 89,task final status : FAILURE [INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[185] - develop mode is: false [INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[203] - exec local path: /exec/process/1/1/6/89 cleared. [INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100% 21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully 21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=2205654 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2630 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=149819 Total time spent by all reduces in occupied slots (ms)=3113 Total time spent by all map tasks (ms)=149819 Total time spent by all reduce tasks (ms)=3113 Total vcore-milliseconds taken by all map tasks=149819 Total vcore-milliseconds taken by all reduce tasks=3113 Total megabyte-milliseconds taken by all map tasks=153414656 Total megabyte-milliseconds taken by all reduce tasks=3187712 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1450 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=6825 CPU time spent (ms)=4980 Physical memory (bytes) snapshot=3529900032 Virtual memory (bytes) snapshot=22377988096 Total committed heap usage (bytes)=2413297664 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 30.695 seconds Estimated value of Pi is 3.20000000000000000000
What you expected to happen
Yarn application_1634958933716_0113 status can always be get;
How to reproduce
Server: KunPeng OS centos7 DS release:1.3.9 Hadoop version :2.9.2 Yarn Ha: False conf/common.properties
resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=
if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=
if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://hadoop47:%s/ws/v1/cluster/apps/%s
Anything else
some times fail ; high probability of this error
Are you willing to submit PR?
Code of Conduct