KingSpring commented 2 years ago

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

operate

when I run a shell for testing mapreduce in ds (fee image below)，ds web log shows : yarn status get failed. shell content : hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10 360截图17520712172535

ds web log

[INFO] 2021-10-26 10:34:28.745 - [taskAppId=TASK-1-6-89]:[115] - create dir success /exec/process/1/1/6/89 [INFO] 2021-10-26 10:34:28.754 - [taskAppId=TASK-1-6-89]:[88] - shell task params {"rawScript":"hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10","localParams":[],"resourceList":[]} [INFO] 2021-10-26 10:34:28.758 - [taskAppId=TASK-1-6-89]:[154] - raw script : hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10 [INFO] 2021-10-26 10:34:28.759 - [taskAppId=TASK-1-6-89]:[155] - task execute path : /exec/process/1/1/6/89 [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[87] - tenantCode user:root, task dir:1_6_89 [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[92] - create command file:/exec/process/1/1/6/89/1_6_89.command [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[111] - command : #!/bin/sh BASEDIR=$(cd dirname $0; pwd) cd $BASEDIR source /opt/app/dolphinscheduler/conf/env/dolphinscheduler_env.sh /exec/process/1/1/6/89/1_6_89_node.sh [INFO] 2021-10-26 10:34:28.764 - [taskAppId=TASK-1-6-89]:[330] - task run command: sudo -u root sh /exec/process/1/1/6/89/1_6_89.command [INFO] 2021-10-26 10:34:28.773 - [taskAppId=TASK-1-6-89]:[211] - process start, process id is: 19627 [INFO] 2021-10-26 10:34:29.774 - [taskAppId=TASK-1-6-89]:[138] - -> SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/app/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/app/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Number of Maps = 10 Samples per Map = 10 [INFO] 2021-10-26 10:34:31.775 - [taskAppId=TASK-1-6-89]:[138] - -> Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 21/10/26 10:34:31 INFO client.RMProxy: Connecting to ResourceManager at hadoop47/192.168.80.47:8032 [INFO] 2021-10-26 10:34:32.776 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:32 INFO input.FileInputFormat: Total input files to process : 10 21/10/26 10:34:32 INFO mapreduce.JobSubmitter: number of splits:10 21/10/26 10:34:32 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 21/10/26 10:34:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1634958933716_0113 21/10/26 10:34:32 INFO impl.YarnClientImpl: Submitted application application_1634958933716_0113 21/10/26 10:34:32 INFO mapreduce.Job: The url to track the job: http://hadoop47:8088/proxy/application_1634958933716_0113/ 21/10/26 10:34:32 INFO mapreduce.Job: Running job: job_1634958933716_0113 [INFO] 2021-10-26 10:34:40.785 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:39 INFO mapreduce.Job: Job job_1634958933716_0113 running in uber mode : false 21/10/26 10:34:39 INFO mapreduce.Job: map 0% reduce 0% [INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0% [INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0% [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113 [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113 [ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0 [INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100% 21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully 21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=2205654 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2630 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=149819 Total time spent by all reduces in occupied slots (ms)=3113 Total time spent by all map tasks (ms)=149819 Total time spent by all reduce tasks (ms)=3113 Total vcore-milliseconds taken by all map tasks=149819 Total vcore-milliseconds taken by all reduce tasks=3113 Total megabyte-milliseconds taken by all map tasks=153414656 Total megabyte-milliseconds taken by all reduce tasks=3187712 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1450 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=6825 CPU time spent (ms)=4980 Physical memory (bytes) snapshot=3529900032 Virtual memory (bytes) snapshot=22377988096 Total committed heap usage (bytes)=2413297664 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 30.695 seconds Estimated value of Pi is 3.20000000000000000000

worker debug log

[DEBUG] 2021-10-26 10:34:56.708 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400c9, packet:: clientPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 serverPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 finished:false header:: 2933,4 replyHeader:: 2933,17180717039,0 request:: '/dolphinscheduler/nodes/worker/default/192.168.80.49:1234,T response:: #302e332c302e39312c302e35392c312e33372c382e302c302e332c323032312d31302d32362030393a32373a30362c323032312d31302d32362031303a33343a35362c302c34303937,s{17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701} [DEBUG] 2021-10-26 10:34:56.708 org.apache.dolphinscheduler.service.zk.ZookeeperCachedOperator:[62] - zookeeperListener:org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$WorkerGroupNodeListener triggered [DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[396] - processResult: CuratorEventImpl{type=GET_DATA, resultCode=0, path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', name='null', children=null, context=null, stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701 , data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55], watchedEvent=null, aclList=null, opResults=null} [DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[857] - publishEvent: TreeCacheEvent{type=NODE_UPDATED, data=ChildData{path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701 , data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55]}} [INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0% [INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0% [DEBUG] 2021-10-26 10:34:58.313 org.apache.zookeeper.ClientCnxn:[745] - Got ping response for sessionid: 0x30015c0a38d009d after 0ms [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113 [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113 [DEBUG] 2021-10-26 10:35:02.715 org.apache.dolphinscheduler.common.utils.HadoopUtils:[211] - yarn application url:http://hadoop47:%s/ws/v1/cluster/apps/%s, applicationId:application_1634958933716_0113 [ERROR] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.common.utils.HttpUtils:[73] - Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused) org.apache.http.conn.HttpHostConnectException: Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.dolphinscheduler.common.utils.HttpUtils.get(HttpUtils.java:60) at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:420) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394) at java.net.Socket.connect(Socket.java:606) at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134) ... 20 common frames omitted [ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0 [INFO] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[147] - task instance id : 89,task final status : FAILURE [INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[185] - develop mode is: false [INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[203] - exec local path: /exec/process/1/1/6/89 cleared. [INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100% 21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully 21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=2205654 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2630 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=149819 Total time spent by all reduces in occupied slots (ms)=3113 Total time spent by all map tasks (ms)=149819 Total time spent by all reduce tasks (ms)=3113 Total vcore-milliseconds taken by all map tasks=149819 Total vcore-milliseconds taken by all reduce tasks=3113 Total megabyte-milliseconds taken by all map tasks=153414656 Total megabyte-milliseconds taken by all reduce tasks=3187712 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1450 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=6825 CPU time spent (ms)=4980 Physical memory (bytes) snapshot=3529900032 Virtual memory (bytes) snapshot=22377988096 Total committed heap usage (bytes)=2413297664 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 30.695 seconds Estimated value of Pi is 3.20000000000000000000

What you expected to happen

Yarn application_1634958933716_0113 status can always be get; 360截图16380508205068

How to reproduce

Server: KunPeng OS centos7 DS release：1.3.9 Hadoop version :2.9.2 Yarn Ha： False conf/common.properties

resourcemanager port, the default value is 8088 if not specified

resource.manager.httpaddress.port=

if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty

yarn.resourcemanager.ha.rm.ids=

if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname

yarn.application.status.address=http://hadoop47:%s/ws/v1/cluster/apps/%s

Anything else

some times fail ; high probability of this error

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

github-actions[bot] commented 2 years ago

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

operate

when I run a shell for testing mapreduce in ds (fee image below)，ds web log shows : yarn status get failed. shell content : hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10 360截图17520712172535

ds web log

[INFO] 2021-10-26 10:34:28.745 - [taskAppId=TASK-1-6-89]:[115] - create dir success /exec/process/1/1/6/89 [INFO] 2021-10-26 10:34:28.754 - [taskAppId=TASK-1-6-89]:[88] - shell task params {"rawScript":"hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10","localParams":[],"resourceList":[]} [INFO] 2021-10-26 10:34:28.758 - [taskAppId=TASK-1-6-89]:[154] - raw script : hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10 [INFO] 2021-10-26 10:34:28.759 - [taskAppId=TASK-1-6-89]:[155] - task execute path : /exec/process/1/1/6/89 [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[87] - tenantCode user:root, task dir:1_6_89 [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[92] - create command file:/exec/process/1/1/6/89/1_6_89.command [INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[111] - command : #!/bin/sh BASEDIR=$(cd dirname $0; pwd) cd $BASEDIR source /opt/app/dolphinscheduler/conf/env/dolphinscheduler_env.sh /exec/process/1/1/6/89/1_6_89_node.sh [INFO] 2021-10-26 10:34:28.764 - [taskAppId=TASK-1-6-89]:[330] - task run command: sudo -u root sh /exec/process/1/1/6/89/1_6_89.command [INFO] 2021-10-26 10:34:28.773 - [taskAppId=TASK-1-6-89]:[211] - process start, process id is: 19627 [INFO] 2021-10-26 10:34:29.774 - [taskAppId=TASK-1-6-89]:[138] - -> SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/app/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/app/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Number of Maps = 10 Samples per Map = 10 [INFO] 2021-10-26 10:34:31.775 - [taskAppId=TASK-1-6-89]:[138] - -> Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 21/10/26 10:34:31 INFO client.RMProxy: Connecting to ResourceManager at hadoop47/192.168.80.47:8032 [INFO] 2021-10-26 10:34:32.776 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:32 INFO input.FileInputFormat: Total input files to process : 10 21/10/26 10:34:32 INFO mapreduce.JobSubmitter: number of splits:10 21/10/26 10:34:32 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 21/10/26 10:34:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1634958933716_0113 21/10/26 10:34:32 INFO impl.YarnClientImpl: Submitted application application_1634958933716_0113 21/10/26 10:34:32 INFO mapreduce.Job: The url to track the job: http://hadoop47:8088/proxy/application_1634958933716_0113/ 21/10/26 10:34:32 INFO mapreduce.Job: Running job: job_1634958933716_0113 [INFO] 2021-10-26 10:34:40.785 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:39 INFO mapreduce.Job: Job job_1634958933716_0113 running in uber mode : false 21/10/26 10:34:39 INFO mapreduce.Job: map 0% reduce 0% [INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0% [INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0% [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113 [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113 [ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0 [INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100% 21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully 21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=2205654 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2630 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=149819 Total time spent by all reduces in occupied slots (ms)=3113 Total time spent by all map tasks (ms)=149819 Total time spent by all reduce tasks (ms)=3113 Total vcore-milliseconds taken by all map tasks=149819 Total vcore-milliseconds taken by all reduce tasks=3113 Total megabyte-milliseconds taken by all map tasks=153414656 Total megabyte-milliseconds taken by all reduce tasks=3187712 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1450 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=6825 CPU time spent (ms)=4980 Physical memory (bytes) snapshot=3529900032 Virtual memory (bytes) snapshot=22377988096 Total committed heap usage (bytes)=2413297664 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 30.695 seconds Estimated value of Pi is 3.20000000000000000000

worker debug log

[DEBUG] 2021-10-26 10:34:56.708 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400c9, packet:: clientPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 serverPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 finished:false header:: 2933,4 replyHeader:: 2933,17180717039,0 request:: '/dolphinscheduler/nodes/worker/default/192.168.80.49:1234,T response:: #302e332c302e39312c302e35392c312e33372c382e302c302e332c323032312d31302d32362030393a32373a30362c323032312d31302d32362031303a33343a35362c302c34303937,s{17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701} [DEBUG] 2021-10-26 10:34:56.708 org.apache.dolphinscheduler.service.zk.ZookeeperCachedOperator:[62] - zookeeperListener:org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$WorkerGroupNodeListener triggered [DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[396] - processResult: CuratorEventImpl{type=GET_DATA, resultCode=0, path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', name='null', children=null, context=null, stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701 , data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55], watchedEvent=null, aclList=null, opResults=null} [DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[857] - publishEvent: TreeCacheEvent{type=NODE_UPDATED, data=ChildData{path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701 , data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55]}} [INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0% [INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0% [DEBUG] 2021-10-26 10:34:58.313 org.apache.zookeeper.ClientCnxn:[745] - Got ping response for sessionid: 0x30015c0a38d009d after 0ms [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113 [INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113 [DEBUG] 2021-10-26 10:35:02.715 org.apache.dolphinscheduler.common.utils.HadoopUtils:[211] - yarn application url:http://hadoop47:%s/ws/v1/cluster/apps/%s, applicationId:application_1634958933716_0113 [ERROR] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.common.utils.HttpUtils:[73] - Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused) org.apache.http.conn.HttpHostConnectException: Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.dolphinscheduler.common.utils.HttpUtils.get(HttpUtils.java:60) at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:420) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394) at java.net.Socket.connect(Socket.java:606) at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134) ... 20 common frames omitted [ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0 [INFO] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[147] - task instance id : 89,task final status : FAILURE [INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[185] - develop mode is: false [INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[203] - exec local path: /exec/process/1/1/6/89 cleared. [INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100% 21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully 21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=2205654 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2630 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=149819 Total time spent by all reduces in occupied slots (ms)=3113 Total time spent by all map tasks (ms)=149819 Total time spent by all reduce tasks (ms)=3113 Total vcore-milliseconds taken by all map tasks=149819 Total vcore-milliseconds taken by all reduce tasks=3113 Total megabyte-milliseconds taken by all map tasks=153414656 Total megabyte-milliseconds taken by all reduce tasks=3187712 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1450 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=6825 CPU time spent (ms)=4980 Physical memory (bytes) snapshot=3529900032 Virtual memory (bytes) snapshot=22377988096 Total committed heap usage (bytes)=2413297664 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 30.695 seconds Estimated value of Pi is 3.20000000000000000000

What you expected to happen

Yarn application_1634958933716_0113 status can always be get; 360截图16380508205068

How to reproduce

Server: KunPeng OS centos7 DS release：1.3.9 Hadoop version :2.9.2 Yarn Ha： False conf/common.properties

resourcemanager port, the default value is 8088 if not specified

resource.manager.httpaddress.port=

if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty

yarn.resourcemanager.ha.rm.ids=

if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname

yarn.application.status.address=http://hadoop47:%s/ws/v1/cluster/apps/%s

Anything else

some times fail ; high probability of this error

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

github-actions[bot] commented 2 years ago

Hi:

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
If you haven't received a reply for a long time, you can subscribe to the developer's email，Mail subscription steps reference https://dolphinscheduler.apache.org/zh-cn/community/development/subscribe.html ,Then write the issue URL in the email content and send question to dev@dolphinscheduler.apache.org.

KingSpring commented 2 years ago

I write log in code org.apache.dolphinscheduler.common.utils.HadoopUtils#getApplicationUrl logger.info("yarn application url:{}", String.format(appUrl, activeResourceManagerPort, applicationId)); the excetption in log is :

----------------log begin---------------------------- [INFO] 2021-10-26 16:46:26.257 - [taskAppId=TASK-1-6-91]:[402] - check yarn application status, appId:application_1634958933716_0116 [INFO] 2021-10-26 16:46:26.300 org.apache.dolphinscheduler.common.utils.PropertyUtils:[140] - For input string: "" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:592) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.dolphinscheduler.common.utils.PropertyUtils.getInt(PropertyUtils.java:138) at org.apache.dolphinscheduler.common.utils.HadoopUtils.(HadoopUtils.java:79) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-10-26 16:46:26.412 - [taskAppId=TASK-1-6-91]:[132] - FINALIZE_SESSION [DEBUG] 2021-10-26 16:46:26.474 org.apache.hadoop.security.authentication.util.KerberosName:[88] - Kerberos krb5 configuration not found, setting default realm to empty [DEBUG] 2021-10-26 16:46:26.739 org.apache.hadoop.util.PerformanceAdvisory:[41] - Falling back to shell based [DEBUG] 2021-10-26 16:46:27.005 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400d1, packet:: clientPath:null serverPath:null finished:false header:: 87,12 replyHeader:: 87,17180767532,0 request:: '/dolphinscheduler/nodes/worker,F response:: v{'default},s{17179869680,17180767372,1634811753491,1635237921707,110,1,0,0,0,1,17179869699} [DEBUG] 2021-10-26 16:46:27.006 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400d1, packet:: clientPath:null serverPath:null finished:false header:: 88,12 replyHeader:: 88,17180767532,0 request:: '/dolphinscheduler/nodes/worker/default,F response:: v{'192.168.80.49:1234,'192.168.80.48:1234},s{17179869699,17179869699,1634811757772,1634811757772,0,90,0,0,0,2,17180767373} [DEBUG] 2021-10-26 16:46:27.007 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400d1, packet:: clientPath:null serverPath:null finished:false header:: 89,4 replyHeader:: 89,17180767532,0 request:: '/dolphinscheduler/nodes/worker/default/192.168.80.49:1234,F response:: #302e342c302e38392c302e37392c312e36322c382e302c302e332c323032312d31302d32362031363a34353a32312c323032312d31302d32362031363a34363a32322c302c3137373636,s{17180767373,17180767520,1635237922183,1635237982207,6,0,0,72064562608931011,74,0,17180767373} [DEBUG] 2021-10-26 16:46:27.008 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400d1, packet:: clientPath:null serverPath:null finished:false header:: 90,4 replyHeader:: 90,17180767532,0 request:: '/dolphinscheduler/nodes/worker/default/192.168.80.48:1234,F response:: #302e382c302e37392c342e38352c332e31372c382e302c302e332c323032312d31302d32362031363a34343a34332c323032312d31302d32362031363a34363a32342c302c3235333330,s{17180767272,17180767526,1635237883984,1635237984013,10,0,0,216196699235614882,74,0,17180767272} [DEBUG] 2021-10-26 16:46:27.103 org.apache.hadoop.util.PerformanceAdvisory:[110] - Both short-circuit local reads and UNIX domain socket are disabled. [DEBUG] 2021-10-26 16:46:27.108 org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil:[183] - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection [DEBUG] 2021-10-26 16:46:27.196 org.apache.dolphinscheduler.common.utils.HadoopUtils:[211] - yarn application url:http://hadoop47:%s/ws/v1/cluster/apps/%s, applicationId:application_1634958933716_0116 ----------------log end ----------------------------

then I set conf/common.propertes ####### resourcemanager port, the default value is 8088 if not specified^M resource.manager.httpaddress.port=8088

then DS workflow is OK , but I don't know why; I don't know my config is set wrong or DS code has bugs ?

KingSpring commented 2 years ago

image shows when resource.manager.httpaddress.port in commen.properties is empty, dolphin can't get the right connection to yarn , it only connect to the rm ip with no port in the yarn url. I think this is a bug ,as least the document shold update to avoid this problem.

zhongjiajie commented 2 years ago

@lenboo What do you think? It's same as we met yesterday in wechat discuss?

lishiyucn commented 2 years ago

你们自己测试都不用这个功能吗？感觉都是一些很低级且必现的错误，尤其是版本升级过来的时候，旧版本没有这个参数，新版本默认可以不加这个参数

KingSpring commented 2 years ago

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

apache / dolphinscheduler

yarn applications: application_1634958933716_0113 , query status failed #6605

Search before asking

What happened

operate

ds web log

worker debug log

What you expected to happen

How to reproduce

resourcemanager port, the default value is 8088 if not specified

if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty

if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname

Anything else

Are you willing to submit PR?

Code of Conduct

Search before asking

What happened

operate

ds web log

worker debug log

What you expected to happen

How to reproduce

resourcemanager port, the default value is 8088 if not specified

if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty

if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname

Anything else

Are you willing to submit PR?

Code of Conduct