DTStack / chunjun

A data integration framework
https://dtstack.github.io/chunjun/
Apache License 2.0
3.99k stars 1.69k forks source link

FlinkX1.12+Hadoop3.0.0+yarn-session模式启动任务失败 #567

Closed wushuoyouting closed 2 years ago

wushuoyouting commented 2 years ago

环境: Flink1.12.5 FlinkX1.12 Hadoop3.0.0 CDH6.3.2

问题描述:

yarn-session启动FlinkX测试任务报错:No flink session found on yarn cluster
[root@server001 ~]# 
[root@server001 ~]# $FLINKX_HOME/bin/flinkx \
>  -mode yarn-session \
>  -jobType sync \
>  -job /root/flinkx/flinkx-examples/json/stream/stream.json
flinkx starting ...
tail: cannot open ‘./nohup.out’ for reading: No such file or directory
tail: ‘./nohup.out’ has appeared;  following end of new file
nohup: appending output to ‘nohup.out’
2021-12-16 17:31:08.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.rpc.address, server001
2021-12-16 17:31:08.475 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.rpc.port, 6123
2021-12-16 17:31:08.475 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-12-16 17:31:08.475 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-12-16 17:31:08.475 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-12-16 17:31:08.475 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: parallelism.default, 1
2021-12-16 17:31:08.475 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability, zookeeper
2021-12-16 17:31:08.475 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability.storageDir, hdfs://server001:8020/flink/ha/
2021-12-16 17:31:08.475 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability.zookeeper.quorum, server001:2181,server002:2181,server003:2181
2021-12-16 17:31:08.476 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-12-16 17:31:08.476 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: rest.bind-port, 50100-50200
2021-12-16 17:31:08.476 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: classloader.resolve-order, parent-first
log4j:ERROR Could not find value for key log4j.appender.logfile
log4j:ERROR Could not instantiate appender named "logfile".
2021-12-16 17:31:08,726 - 0    WARN  [main] org.apache.hadoop.util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-12-16 17:31:08,828 - 102  INFO  [main] org.apache.hadoop.yarn.client.RMProxy:Connecting to ResourceManager at server001/10.0.10.100:8032
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: No flink session found on yarn cluster.
    at com.dtstack.flinkx.client.yarn.YarnSessionClusterClientHelper.submit(YarnSessionClusterClientHelper.java:113)
    at com.dtstack.flinkx.client.Launcher.main(Launcher.java:126)
Caused by: java.lang.RuntimeException: No flink session found on yarn cluster.
    at com.dtstack.flinkx.client.yarn.YarnSessionClusterClientHelper.submit(YarnSessionClusterClientHelper.java:82)
    ... 1 more

自我排除环境和测试方式的问题,用下面的几个测试任务说明

1. 排除Flink+hadoop的问题

系统环境变量

[root@server001 ~]# cat ~/.bashrc
# .bashrc
# User specific aliases and functions
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
# Source global definitions
if [ -f /etc/bashrc ]; then
    . /etc/bashrc
fi
export LC_ALL=zh_CN.utf8
export LANG=zh_CN.utf8
export LANGUAGE=zh_CN.utf8
export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export JAVA_LIBRAY_PATH=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
export YARN_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-yarn
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HADOOP_COMMON_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-hdfs
export HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
export HADOOP_CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath`
export ZOOKEEPER_HOMT=/opt/cloudera/parcels/CDH/lib/zookeeper
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$YARN_HOME/bin:$YARN_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOMT/bin

export FLINK_HOME=/opt/cloudera/parcels/FLINK/lib/flink/
export PATH=$PATH:$FLINK_HOME/bin

export FLINKX_HOME=/root/flinkx
export PATH=$PATH:$FLINKX_HOME/bin

1.1 启动Flink yarn-session测试任务

$FLINK_HOME/bin/yarn-session.sh -n 2 -jm 1024 -tm 1024
[root@server001 ~]# yarn application -list
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
21/12/16 17:39:39 INFO client.RMProxy: Connecting to ResourceManager at server001/10.0.10.100:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):1
                Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL
application_1639465575569_0016  Flink session cluster           Apache Flink          root  root.users.root            RUNNING           UNDEFINED             100%              http://server003:50100
[root@server001 ~]# $FLINK_HOME/bin/flink run $FLINK_HOME/examples/batch/WordCount.jar -input hdfs://server001:8020//test/input/wordcount.txt -output hdfs://server001:8020/test/output/2021121601
Setting HBASE_CONF_DIR=/etc/hbase/conf because no HBASE_CONF_DIR was set.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/FLINK-1.12.5-BIN-SCALA_2.12/lib/flink/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-12-16 17:39:44,617 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Found Yarn properties file under /tmp/.yarn-properties-root.
2021-12-16 17:39:44,617 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Found Yarn properties file under /tmp/.yarn-properties-root.
2021-12-16 17:39:45,628 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/opt/cloudera/parcels/FLINK-1.12.5-BIN-SCALA_2.12/lib/flink/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2021-12-16 17:39:45,654 INFO  org.apache.hadoop.yarn.client.RMProxy                        [] - Connecting to ResourceManager at server001/10.0.10.100:8032
2021-12-16 17:39:45,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2021-12-16 17:39:45,852 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface server003:50100 of application 'application_1639465575569_0016'.
Job has been submitted with JobID 152f7a2de9ef3c1e96f40f00a878c357
Program execution finished
Job with JobID 152f7a2de9ef3c1e96f40f00a878c357 has finished.
Job Runtime: 9213 ms
# 任务执行正常
[root@server001 ~]# hdfs dfs -cat /test/output/2021121601
apple 3
flink 1
hadoop 2
pear 2

1.2 启动Flink yarn-perjob测试任务

[root@server001 ~]# yarn application -list
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
21/12/16 17:49:21 INFO client.RMProxy: Connecting to ResourceManager at server001/10.0.10.100:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
                Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL
[root@server001 ~]# $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 1024 -ytm 1024 $FLINK_HOME/examples/batch/WordCount.jar -input hdfs://server001:8020/test/input/wordcount.txt  -output hdfs://server001:8020/test/output/2021121603
Setting HBASE_CONF_DIR=/etc/hbase/conf because no HBASE_CONF_DIR was set.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/FLINK-1.12.5-BIN-SCALA_2.12/lib/flink/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-12-16 17:49:27,048 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/opt/cloudera/parcels/FLINK-1.12.5-BIN-SCALA_2.12/lib/flink/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2021-12-16 17:49:27,079 INFO  org.apache.hadoop.yarn.client.RMProxy                        [] - Connecting to ResourceManager at server001/10.0.10.100:8032
2021-12-16 17:49:27,243 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2021-12-16 17:49:27,323 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2021-12-16 17:49:27,323 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2021-12-16 17:49:27,360 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, slotsPerTaskManager=1}
2021-12-16 17:49:32,338 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1639465575569_0019
2021-12-16 17:49:32,361 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1639465575569_0019
2021-12-16 17:49:32,361 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2021-12-16 17:49:32,362 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2021-12-16 17:49:37,380 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2021-12-16 17:49:37,381 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface server003:50100 of application 'application_1639465575569_0019'.
Job has been submitted with JobID c8a7d34bbfc071668597cd4d32ebb58a
Program execution finished
Job with JobID c8a7d34bbfc071668597cd4d32ebb58a has finished.
Job Runtime: 12250 ms

Exception in thread "Thread-7" java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'.
    at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
    at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
    at org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2647)
    at org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:2905)
    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2864)
    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2838)
    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2715)
    at org.apache.hadoop.conf.Configuration.get(Configuration.java:1186)
    at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1774)
    at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
    at org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
    at org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
    at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)

[root@server001 ~]# hdfs dfs -cat /test/output/2021121603
apple 3
flink 1
hadoop 2
pear 2

从上诉测试能够说明Flink+Hadoop正常使用

2. 排除测试FlinkX的问题

2.1 启动FlinkX yarn-perjob模式

[root@server001 ~]# yarn application -list
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
21/12/16 17:54:10 INFO client.RMProxy: Connecting to ResourceManager at server001/10.0.10.100:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
                Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL
[root@server001 ~]# $FLINKX_HOME/bin/flinkx \
> -mode yarn-per-job \
> -jobType sync \
> -job $FLINKX_HOME/flinkx-examples/json/stream/stream.json 
flinkx starting ...
log4j:ERROR Could not find value for key log4j.appender.logfile
log4j:ERROR Could not instantiate appender named "logfile".
2021-12-16 17:31:08,726 - 0    WARN  [main] org.apache.hadoop.util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-12-16 17:31:08,828 - 102  INFO  [main] org.apache.hadoop.yarn.client.RMProxy:Connecting to ResourceManager at server001/10.0.10.100:8032
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: No flink session found on yarn cluster.
    at com.dtstack.flinkx.client.yarn.YarnSessionClusterClientHelper.submit(YarnSessionClusterClientHelper.java:113)
    at com.dtstack.flinkx.client.Launcher.main(Launcher.java:126)
Caused by: java.lang.RuntimeException: No flink session found on yarn cluster.
    at com.dtstack.flinkx.client.yarn.YarnSessionClusterClientHelper.submit(YarnSessionClusterClientHelper.java:82)
    ... 1 more
nohup: appending output to ‘nohup.out’
2021-12-16 17:54:38.319 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.rpc.address, server001
2021-12-16 17:54:38.321 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.rpc.port, 6123
2021-12-16 17:54:38.321 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-12-16 17:54:38.321 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-12-16 17:54:38.321 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-12-16 17:54:38.321 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: parallelism.default, 1
2021-12-16 17:54:38.321 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability, zookeeper
2021-12-16 17:54:38.322 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability.storageDir, hdfs://server001:8020/flink/ha/
2021-12-16 17:54:38.322 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability.zookeeper.quorum, server001:2181,server002:2181,server003:2181
2021-12-16 17:54:38.322 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-12-16 17:54:38.322 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: rest.bind-port, 50100-50200
2021-12-16 17:54:38.322 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: classloader.resolve-order, parent-first
log4j:ERROR Could not find value for key log4j.appender.logfile
log4j:ERROR Could not instantiate appender named "logfile".
2021-12-16 17:54:38,557 - 0    WARN  [main] org.apache.hadoop.util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-12-16 17:54:38.662 [main] INFO  org.apache.flink.runtime.security.modules.HadoopModule  - Hadoop user set to root (auth:SIMPLE)
2021-12-16 17:54:38.680 [main] INFO  org.apache.flink.runtime.security.modules.JaasModule  - Jaas file will be created as /tmp/jaas-6554950734081603818.conf.
2021-12-16 17:54:38,782 - 225  INFO  [main] org.apache.hadoop.yarn.client.RMProxy:Connecting to ResourceManager at server001/10.0.10.100:8032
2021-12-16 17:54:38.866 [main] INFO  org.apache.flink.yarn.YarnClusterDescriptor  - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - ------------program params-------------------------
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - -flinkLibDir
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - /opt/cloudera/parcels/FLINK/lib/flink/lib
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - -p
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - 
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - -job
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - {
  "job": {
    "content": [
      {
        "reader": {
          "parameter": {
            "column": [
              {
                "name": "id",
                "type": "id"
              },
              {
                "name": "name",
                "type": "string"
              },
              {
                "name": "content",
                "type": "string"
              }
            ],
            "sliceRecordCount": ["30"],
            "permitsPerSecond": 1
          },
          "table": {
            "tableName": "sourceTable"
          },
          "name": "streamreader"
        },
        "writer": {
          "parameter": {
            "column": [
              {
                "name": "id",
                "type": "id"
              },
              {
                "name": "name",
                "type": "string"
              },
              {
                "name": "content",
                "type": "timestamp"
              }
            ],
            "print": true
          },
          "table": {
            "tableName": "sinkTable"
          },
          "name": "streamwriter"
        },
        "transformer": {
          "transformSql": "select id,name, NOW() from sourceTable where CHAR_LENGTH(name) < 50 and CHAR_LENGTH(content) < 50"
        }
      }
    ],
    "setting": {
      "errorLimit": {
        "record": 100
      },
      "speed": {
        "bytes": 0,
        "channel": 1,
        "readerChannel": 1,
        "writerChannel": 1
      }
    }
  }
}

2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - -jobName
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - Flink_Job
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - -flinkxDistDir
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - /root/flinkx/flinkx-dist
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - -jobType
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - sync
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - -hadoopConfDir
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop
2021-12-16 17:54:39.114 [main] INFO  com.dtstack.flinkx.Main  - -confProp
2021-12-16 17:54:39.115 [main] INFO  com.dtstack.flinkx.Main  - {}
2021-12-16 17:54:39.115 [main] INFO  com.dtstack.flinkx.Main  - -pluginLoadMode
2021-12-16 17:54:39.115 [main] INFO  com.dtstack.flinkx.Main  - shipfile
2021-12-16 17:54:39.115 [main] INFO  com.dtstack.flinkx.Main  - -mode
2021-12-16 17:54:39.115 [main] INFO  com.dtstack.flinkx.Main  - yarn-per-job
2021-12-16 17:54:39.115 [main] INFO  com.dtstack.flinkx.Main  - -flinkConfDir
2021-12-16 17:54:39.115 [main] INFO  com.dtstack.flinkx.Main  - /opt/cloudera/parcels/FLINK/lib/flink/conf
2021-12-16 17:54:39.115 [main] INFO  com.dtstack.flinkx.Main  - -------------------------------------------
2021-12-16 17:54:39.117 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.rpc.address, server001
2021-12-16 17:54:39.117 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.rpc.port, 6123
2021-12-16 17:54:39.117 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-12-16 17:54:39.117 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-12-16 17:54:39.117 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-12-16 17:54:39.118 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: parallelism.default, 1
2021-12-16 17:54:39.118 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability, zookeeper
2021-12-16 17:54:39.118 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability.storageDir, hdfs://server001:8020/flink/ha/
2021-12-16 17:54:39.118 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability.zookeeper.quorum, server001:2181,server002:2181,server003:2181
2021-12-16 17:54:39.118 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-12-16 17:54:39.118 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: rest.bind-port, 50100-50200
2021-12-16 17:54:39.118 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: classloader.resolve-order, parent-first
2021-12-16 17:54:40.390 [main] INFO  com.dtstack.flinkx.classloader.ClassLoaderManager  - jarUrl:file:/root/flinkx/flinkx-dist/connector/stream/flinkx-connector-stream-master.jar create ClassLoad successful...
2021-12-16 17:54:42.377 [main] INFO  org.apache.flink.yarn.YarnClusterDescriptor  - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, numberTaskManagers=1, slotsPerTaskManager=1, priority=0}
2021-12-16 17:54:42.380 [main] WARN  org.apache.flink.core.plugin.PluginConfig  - The plugins directory [plugins] does not exist.
2021-12-16 17:54:46.032 [main] WARN  org.apache.flink.core.plugin.PluginConfig  - The plugins directory [plugins] does not exist.
2021-12-16 17:54:47.606 [main] INFO  o.apache.flink.runtime.util.config.memory.ProcessMemoryUtils  - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-12-16 17:54:47.615 [main] INFO  org.apache.flink.yarn.YarnClusterDescriptor  - Submitting application master application_1639465575569_0020
2021-12-16 17:54:47,676 - 9119 INFO  [main] org.apache.hadoop.yarn.client.api.impl.YarnClientImpl:Submitted application application_1639465575569_0020
2021-12-16 17:54:47.676 [main] INFO  org.apache.flink.yarn.YarnClusterDescriptor  - Waiting for the cluster to be allocated
2021-12-16 17:54:47.677 [main] INFO  org.apache.flink.yarn.YarnClusterDescriptor  - Deploying cluster, current state ACCEPTED
2021-12-16 17:54:52.946 [main] INFO  org.apache.flink.yarn.YarnClusterDescriptor  - YARN application has been deployed successfully.
2021-12-16 17:54:52.947 [main] INFO  org.apache.flink.yarn.YarnClusterDescriptor  - The Flink YARN session cluster has been started in detached mode. In order to stop Flink gracefully, use the following command:
$ echo "stop" | ./bin/yarn-session.sh -id application_1639465575569_0020
If this should not be possible, then you can also kill Flink via YARN's web interface or via:
$ yarn application -kill application_1639465575569_0020
Note that killing Flink might not clean up all job artifacts and temporary files.
2021-12-16 17:54:52.947 [main] INFO  org.apache.flink.yarn.YarnClusterDescriptor  - Found Web Interface server003:50100 of application 'application_1639465575569_0020'.
2021-12-16 17:54:52.960 [main] INFO  org.apache.flink.runtime.util.ZooKeeperUtils  - Enforcing default ACL for ZK connections
2021-12-16 17:54:52.960 [main] INFO  org.apache.flink.runtime.util.ZooKeeperUtils  - Using '/flink/application_1639465575569_0020' as Zookeeper namespace.
2021-12-16 17:54:53.008 [main] INFO  o.a.f.shaded.curator4.org.apache.curator.utils.Compatibility  - Running in ZooKeeper 3.4.x compatibility mode
2021-12-16 17:54:53.009 [main] INFO  o.a.f.shaded.curator4.org.apache.curator.utils.Compatibility  - Using emulated InjectSessionExpiration
2021-12-16 17:54:53.031 [main] INFO  o.a.f.s.c.o.a.curator.framework.imps.CuratorFrameworkImpl  - Starting
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:host.name=server001
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:java.version=1.8.0_181
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:java.vendor=Oracle Corporation
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:java.home=/usr/java/jdk1.8.0_181-cloudera/jre
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:java.class.path=/root/flinkx/lib/flinkx-clients-master.jar
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:java.io.tmpdir=/tmp
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:java.compiler=<NA>
2021-12-16 17:54:53.036 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:os.name=Linux
2021-12-16 17:54:53.037 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:os.arch=amd64
2021-12-16 17:54:53.037 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:os.version=3.10.0-957.21.3.el7.x86_64
2021-12-16 17:54:53.037 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:user.name=root
2021-12-16 17:54:53.037 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:user.home=/root
2021-12-16 17:54:53.037 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Client environment:user.dir=/root
2021-12-16 17:54:53.037 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Initiating client connection, connectString=server001:2181,server002:2181,server003:2181 sessionTimeout=60000 watcher=org.apache.flink.shaded.curator4.org.apache.curator.ConnectionState@43090195
2021-12-16 17:54:53.047 [main-SendThread(server002:2181)] WARN  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-6554950734081603818.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2021-12-16 17:54:53.048 [main] INFO  o.a.f.s.c.o.a.curator.framework.imps.CuratorFrameworkImpl  - Default schema
2021-12-16 17:54:53.048 [main-SendThread(server002:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Opening socket connection to server server002/10.0.10.101:2181
2021-12-16 17:54:53.049 [main-EventThread] ERROR o.a.flink.shaded.curator4.org.apache.curator.ConnectionState  - Authentication failed
2021-12-16 17:54:53.051 [main-SendThread(server002:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Socket connection established to server002/10.0.10.101:2181, initiating session
2021-12-16 17:54:53.059 [main-SendThread(server002:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Session establishment complete on server server002/10.0.10.101:2181, sessionid = 0x17db7c2d0ca0fa8, negotiated timeout = 40000
2021-12-16 17:54:53.060 [main-EventThread] INFO  o.a.f.s.c.o.a.curator.framework.state.ConnectionStateManager  - State change: CONNECTED
2021-12-16 17:54:53.140 [main] INFO  o.a.f.runtime.leaderretrieval.DefaultLeaderRetrievalService  - Starting DefaultLeaderRetrievalService with ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/rest_server_lock'}.
2021-12-16 17:54:53.140 [main] INFO  com.dtstack.flinkx.client.yarn.YarnPerJobClusterClientHelper  - deploy per_job with appId: application_1639465575569_0020}, jobId: afebe7eaab500f3cefa1481640e9273d
2021-12-16 17:54:53.140 [main] INFO  org.apache.flink.runtime.util.ZooKeeperUtils  - Enforcing default ACL for ZK connections
2021-12-16 17:54:53.140 [main] INFO  org.apache.flink.runtime.util.ZooKeeperUtils  - Using '/flink/application_1639465575569_0020' as Zookeeper namespace.
2021-12-16 17:54:53.140 [main] INFO  o.a.f.s.c.o.a.curator.framework.imps.CuratorFrameworkImpl  - Starting
2021-12-16 17:54:53.141 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Initiating client connection, connectString=server001:2181,server002:2181,server003:2181 sessionTimeout=60000 watcher=org.apache.flink.shaded.curator4.org.apache.curator.ConnectionState@3883031d
2021-12-16 17:54:53.142 [main-SendThread(server001:2181)] WARN  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-6554950734081603818.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2021-12-16 17:54:53.142 [main] INFO  o.a.f.s.c.o.a.curator.framework.imps.CuratorFrameworkImpl  - Default schema
2021-12-16 17:54:53.143 [main-SendThread(server001:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Opening socket connection to server server001/10.0.10.100:2181
2021-12-16 17:54:53.143 [main-EventThread] ERROR o.a.flink.shaded.curator4.org.apache.curator.ConnectionState  - Authentication failed
2021-12-16 17:54:53.143 [main-SendThread(server001:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Socket connection established to server001/10.0.10.100:2181, initiating session
2021-12-16 17:54:53.150 [main-SendThread(server001:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Session establishment complete on server server001/10.0.10.100:2181, sessionid = 0x37db7c3106f0f99, negotiated timeout = 40000
2021-12-16 17:54:53.150 [main-EventThread] INFO  o.a.f.s.c.o.a.curator.framework.state.ConnectionStateManager  - State change: CONNECTED
2021-12-16 17:54:53.156 [main] INFO  o.a.f.runtime.leaderretrieval.DefaultLeaderRetrievalService  - Starting DefaultLeaderRetrievalService with ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/rest_server_lock'}.

上诉测试说明,FlinkX yarn-perjob模式能够正常使用

2.2 测试yarn-session模式

$FLINK_HOME/bin/yarn-session.sh -n 1 -s 1 -jm 1024 -tm 1024
[root@server001 ~]# yarn application -list
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
21/12/16 18:01:28 INFO client.RMProxy: Connecting to ResourceManager at server001/10.0.10.100:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):1
                Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL
application_1639465575569_0021  Flink session cluster           Apache Flink          root  root.users.root            RUNNING           UNDEFINED             100%              http://server002:50100
[root@server001 ~]# $FLINKX_HOME/bin/flinkx \
>     -mode yarn-session \
>         -jobType sync \
>     -job $FLINKX_HOME/flinkx-examples/json/stream/stream.json \
>         -pluginLoadMode classpath
flinkx starting ...
2021-12-16 17:54:53.140 [main] INFO  o.a.f.s.c.o.a.curator.framework.imps.CuratorFrameworkImpl  - Starting
2021-12-16 17:54:53.141 [main] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper  - Initiating client connection, connectString=server001:2181,server002:2181,server003:2181 sessionTimeout=60000 watcher=org.apache.flink.shaded.curator4.org.apache.curator.ConnectionState@3883031d
2021-12-16 17:54:53.142 [main-SendThread(server001:2181)] WARN  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-6554950734081603818.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2021-12-16 17:54:53.142 [main] INFO  o.a.f.s.c.o.a.curator.framework.imps.CuratorFrameworkImpl  - Default schema
2021-12-16 17:54:53.143 [main-SendThread(server001:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Opening socket connection to server server001/10.0.10.100:2181
2021-12-16 17:54:53.143 [main-EventThread] ERROR o.a.flink.shaded.curator4.org.apache.curator.ConnectionState  - Authentication failed
2021-12-16 17:54:53.143 [main-SendThread(server001:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Socket connection established to server001/10.0.10.100:2181, initiating session
2021-12-16 17:54:53.150 [main-SendThread(server001:2181)] INFO  o.a.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn  - Session establishment complete on server server001/10.0.10.100:2181, sessionid = 0x37db7c3106f0f99, negotiated timeout = 40000
2021-12-16 17:54:53.150 [main-EventThread] INFO  o.a.f.s.c.o.a.curator.framework.state.ConnectionStateManager  - State change: CONNECTED
2021-12-16 17:54:53.156 [main] INFO  o.a.f.runtime.leaderretrieval.DefaultLeaderRetrievalService  - Starting DefaultLeaderRetrievalService with ZookeeperLeaderRetrievalDriver{retrievalPath='/leader/rest_server_lock'}.
nohup: appending output to ‘nohup.out’
2021-12-16 18:02:10.469 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.rpc.address, server001
2021-12-16 18:02:10.471 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.rpc.port, 6123
2021-12-16 18:02:10.471 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-12-16 18:02:10.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-12-16 18:02:10.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-12-16 18:02:10.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: parallelism.default, 1
2021-12-16 18:02:10.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability, zookeeper
2021-12-16 18:02:10.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability.storageDir, hdfs://server001:8020/flink/ha/
2021-12-16 18:02:10.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: high-availability.zookeeper.quorum, server001:2181,server002:2181,server003:2181
2021-12-16 18:02:10.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-12-16 18:02:10.472 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: rest.bind-port, 50100-50200
2021-12-16 18:02:10.473 [main] INFO  org.apache.flink.configuration.GlobalConfiguration  - Loading configuration property: classloader.resolve-order, parent-first
log4j:ERROR Could not find value for key log4j.appender.logfile
log4j:ERROR Could not instantiate appender named "logfile".
2021-12-16 18:02:10,709 - 0    WARN  [main] org.apache.hadoop.util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-12-16 18:02:10,808 - 99   INFO  [main] org.apache.hadoop.yarn.client.RMProxy:Connecting to ResourceManager at server001/10.0.10.100:8032
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: No flink session found on yarn cluster.
    at com.dtstack.flinkx.client.yarn.YarnSessionClusterClientHelper.submit(YarnSessionClusterClientHelper.java:113)
    at com.dtstack.flinkx.client.Launcher.main(Launcher.java:126)
Caused by: java.lang.RuntimeException: No flink session found on yarn cluster.
    at com.dtstack.flinkx.client.yarn.YarnSessionClusterClientHelper.submit(YarnSessionClusterClientHelper.java:82)
    ... 1 more

测试失败,能否看看到底我的配置问题,还是Flinkx本身哪里出了问题,感谢大家的多多回复

wushuoyouting commented 2 years ago

2.1 yarn-perjob测试结果和日志

wushuoyouting commented 2 years ago

2.1 yarn-perjob测试结果和日志 ![Uploading clipboard.png…]() [Uploading application_1639465575569_0020.txt…]()

wushuoyouting commented 2 years ago

2.1 yarn-perjob测试结果和日志 clipboard [Uploading application_1639465575569_0020.txt…]()

wushuoyouting commented 2 years ago

原来要根据自己的队列来提交yarn-session任务 $FLINKX_HOME/bin/flinkx \ -mode yarn-session \ -jobType sync \ -job $FLINKX_HOME/flinkx-examples/json/stream/stream.json \ -pluginLoadMode classpath \ -confProp "{\"yarn.application.queue\":root.users.root}"

看来是我提问的方式不对,20211223在群里也看到有人问这个问题,只有渡劫给了答复