alibaba / x-deeplearning

An industrial deep learning framework for high-dimension sparse data
Apache License 2.0
4.25k stars 1.03k forks source link

xdl_submit.py --config=config.json提交报FAILED #86

Open bboy-yang opened 5 years ago

bboy-yang commented 5 years ago

hadoop bin /home/hadoop/hadoop-3.1.1/bin/hadoop CMD: /home/hadoop/hadoop-3.1.1/bin/hadoop jar /usr/bin/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.alibaba.xdl.Client -c=config.json -f=/usr/bin/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar -uuid=4911031c-b68a-4bd8-9b27-747956171b4e 2019-01-22 20:27:53,067 INFO xdl.Client: Yarn client start success. 2019-01-22 20:27:53,162 INFO xdl.Client: Create application with id:[application_1547717863709_0021] success. 2019-01-22 20:27:53,893 INFO xdl.Utils: Path:[hdfs://searchns1/user/admin/.xdl/application_1547717863709_0021] not exists, create success. 2019-01-22 20:27:53,893 INFO xdl.Client: Application base path:[hdfs://searchns1/user/admin/.xdl/application_1547717863709_0021/]. 2019-01-22 20:27:54,163 INFO xdl.Client: Upload file config.json to hdfs:/searchns1/user/admin/.xdl/application_1547717863709_0021/config.json success. 2019-01-22 20:27:54,163 INFO xdl.Client: begin to upload files to hdfs 2019-01-22 20:27:54,242 INFO xdl.Client: Upload file /usr/bin/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar to hdfs:/searchns1/user/admin/.xdl/application_1547717863709_0021/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar success. 2019-01-22 20:28:12,630 INFO xdl.Utils: Run cmd [tar -czf /tmp/xdl_local/4911031c-b68a-4bd8-9b27-747956171b4e/songyang31.tar.gz -C /home/admin ./songyang31] success. 2019-01-22 20:28:13,426 INFO xdl.Client: Upload file /home/admin/songyang31/ to hdfs://searchns1/user/admin/.xdl/application_1547717863709_0021/songyang31.tar.gz success. 2019-01-22 20:28:13,426 INFO xdl.Client: finish uploading files to hdfs 2019-01-22 20:28:13,426 INFO xdl.Client: Upload user files success. 2019-01-22 20:28:13,426 INFO xdl.Client: ApplicationMaster start command is: [$JAVA_HOME/bin/java -Xmx256M com.alibaba.xdl.AppMasterRunner -c=config.json -v=songyang31.tar.gz -u=admin -p=hdfs://searchns1/user/admin/.xdl/application_1547717863709_0021/ 1>/stdout 2>/stderr] 2019-01-22 20:28:13,535 INFO xdl.Client: local resources: {config.json=resource { scheme: "hdfs" host: "searchns1" port: -1 file: "/user/admin/.xdl/application_1547717863709_0021/config.json" } size: 485 timestamp: 1548160074127 type: FILE visibility: PUBLIC, xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar=resource { scheme: "hdfs" host: "searchns1" port: -1 file: "/user/admin/.xdl/application_1547717863709_0021/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar" } size: 4784145 timestamp: 1548160074228 type: FILE visibility: PUBLIC} 2019-01-22 20:28:13,543 INFO xdl.Client: Master add CLASSPATH:$HADOOP_CONF_DIR 2019-01-22 20:28:13,544 INFO xdl.Client: Master add CLASSPATH:$HADOOP_COMMON_HOME/share/hadoop/common/ 2019-01-22 20:28:13,544 INFO xdl.Client: Master add CLASSPATH:$HADOOP_COMMON_HOME/share/hadoop/common/lib/ 2019-01-22 20:28:13,544 INFO xdl.Client: Master add CLASSPATH:$HADOOP_HDFS_HOME/share/hadoop/hdfs/ 2019-01-22 20:28:13,544 INFO xdl.Client: Master add CLASSPATH:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/ 2019-01-22 20:28:13,544 INFO xdl.Client: Master add CLASSPATH:$HADOOP_YARN_HOME/share/hadoop/yarn/ 2019-01-22 20:28:13,544 INFO xdl.Client: Master add CLASSPATH:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/ 2019-01-22 20:28:13,544 INFO xdl.Client: Setup ApplicationMaster container success. 2019-01-22 20:28:13,567 INFO conf.Configuration: resource-types.xml not found 2019-01-22 20:28:13,567 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2019-01-22 20:28:13,572 INFO xdl.Client: Setup application context success. 2019-01-22 20:28:13,572 INFO xdl.Client: Submitting application application_1547717863709_0021 2019-01-22 20:28:13,830 INFO impl.YarnClientImpl: Submitted application application_1547717863709_0021 2019-01-22 20:28:13,832 INFO xdl.Client: AppMaster host N/A Start waiting application: application_1547717863709_0021 ends. 2019-01-22 20:28:17,186 INFO xdl.Client: Application application_1547717863709_0021 finish with state FINISHED 2019-01-22 20:28:17,189 INFO xdl.Utils: ================================FINAL STATUS================================== 2019-01-22 20:28:17,189 INFO xdl.Utils: application_1547717863709_0021 : FAILED 2019-01-22 20:28:17,189 INFO xdl.Utils: ================================FINAL STATUS================================== 2019-01-22 20:28:17,216 INFO xdl.Utils: Delete the hdfs dir:hdfs://searchns1/user/admin/.xdl/application_1547717863709_0021/ success. 1、以上报FAILED,请教是什么问题呢? 2、yarn logs -applicationId application_1547717863709_0021,出现以下信息,求解。 File /tmp/logs/admin/logs/application_1547717863709_0021 does not exist.

Can not find any log file matching the pattern: [ALL] for the application: application_1547717863709_0021 Can not find the logs for the application: application_1547717863709_0021 with the appOwner: admin

yiling-dc commented 5 years ago

请提供下appmaster的日志

bboy-yang commented 5 years ago

2019-01-23 11:16:23,820 INFO conf.Configuration: resource-types.xml not found 2019-01-23 11:16:23,820 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2019-01-23 11:16:23,840 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address 2019-01-23 11:16:23,852 INFO xdl.AppMasterBase: Zookeeper connect string is:[11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181] 2019-01-23 11:16:23,908 INFO imps.CuratorFrameworkImpl: Starting 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.9-1757313, built on 08/23/2016 06:50 GMT 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:host.name=host-11-3-220-137 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_121 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:java.home=/export/servers/jdk1.8.0_121/jre 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-3.1.1/etc/hadoop:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-kms-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-nfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/httpclient-4.5.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/snappy-java-1.0.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/hadoop-auth-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-client-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/hadoop-annotations-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-server-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-framework-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-webapp-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-io-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-http-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/httpcore-4.4.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/zookeeper-3.4.9.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-lang3-3.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-util-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-security-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-core-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-servlet-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/json-smart-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-xml-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-databind-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-io-2.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsch-0.1.54.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-recipes-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-rbf-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-native-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-native-client-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-nfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-client-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-rbf-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/httpclient-4.5.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/snappy-java-1.0.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/hadoop-auth-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-client-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/hadoop-annotations-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-server-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-framework-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-webapp-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-io-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-http-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/httpcore-4.4.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/zookeeper-3.4.9.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/okio-1.6.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/netty-all-4.0.52.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/gson-2.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-lang3-3.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-util-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-security-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-core-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-util-ajax-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/xz-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-servlet-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/json-smart-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-xml-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-databind-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-io-2.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsch-0.1.54.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-recipes-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-services-api-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-services-core-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-tests-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-api-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-registry-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-router-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/snakeyaml-1.16.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/java-util-1.9.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jersey-client-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/fst-2.50.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/objenesis-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-jaxrs-base-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/json-io-2.5.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/guice-servlet-4.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jersey-guice-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/dnsjava-2.1.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/guice-4.0.jar:/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548213175912_0001/container_e04_1548213175912_0001_01_000001/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:java.compiler= 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-327.28.3.el7.x86_64 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:user.dir=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548213175912_0001/container_e04_1548213175912_0001_01_000001 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@76508ed1 2019-01-23 11:16:23,933 INFO zookeeper.ClientCnxn: Opening socket connection to server 11.3.221.39/11.3.221.39:2181. Will not attempt to authenticate using SASL (unknown error) 2019-01-23 11:16:23,942 INFO zookeeper.ClientCnxn: Socket connection established to 11.3.221.39/11.3.221.39:2181, initiating session 2019-01-23 11:16:23,961 INFO zookeeper.ClientCnxn: Session establishment complete on server 11.3.221.39/11.3.221.39:2181, sessionid = 0x267dffab8d1000d, negotiated timeout = 40000 2019-01-23 11:16:23,983 INFO state.ConnectionStateManager: State change: CONNECTED 2019-01-23 11:16:24,019 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-01-23 11:16:25,091 INFO xdl.AppMasterBase: ResourceManager client started. 2019-01-23 11:16:25,276 ERROR xdl.AppMasterRunner: run error! org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 'yarn.io/gpu'. Known resources are [name: memory-mb, units: Mi, type: COUNTABLE, value: 8192, minimum allocation: 0, maximum allocation: 9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 4, minimum allocation: 0, maximum allocation: 9223372036854775807] at org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:269) at org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.getResourceInformation(ResourcePBImpl.java:208) at org.apache.hadoop.yarn.api.records.Resource.getResourceValue(Resource.java:306) at org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.getResourceValue(ResourcePBImpl.java:214) at com.alibaba.xdl.AppMasterBase.createRMClient(AppMasterBase.java:1046) at com.alibaba.xdl.AppMasterBase.run(AppMasterBase.java:156) at com.alibaba.xdl.AppMasterRunner.main(AppMasterRunner.java:81) 2019-01-23 11:16:25,299 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 2019-01-23 11:16:25,403 ERROR xdl.AppMasterBase: deal with exit error! 2019-01-23 11:16:25,440 INFO imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting 2019-01-23 11:16:25,475 INFO zookeeper.ZooKeeper: Session: 0x267dffab8d1000d closed 2019-01-23 11:16:25,476 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x267dffab8d1000d

yiling-dc commented 5 years ago

2019-01-23 11:16:23,820 INFO conf.Configuration: resource-types.xml not found 2019-01-23 11:16:23,820 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2019-01-23 11:16:23,840 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address 2019-01-23 11:16:23,852 INFO xdl.AppMasterBase: Zookeeper connect string is:[11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181] 2019-01-23 11:16:23,908 INFO imps.CuratorFrameworkImpl: Starting 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.9-1757313, built on 08/23/2016 06:50 GMT 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:host.name=host-11-3-220-137 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_121 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:java.home=/export/servers/jdk1.8.0_121/jre 2019-01-23 11:16:23,914 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-3.1.1/etc/hadoop:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-kms-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-nfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/httpclient-4.5.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/snappy-java-1.0.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/hadoop-auth-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-client-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/hadoop-annotations-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-server-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-framework-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-webapp-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-io-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-http-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/httpcore-4.4.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/zookeeper-3.4.9.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-lang3-3.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-util-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-security-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-core-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-servlet-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/json-smart-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-xml-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-databind-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-io-2.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsch-0.1.54.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-recipes-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-rbf-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-native-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-native-client-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-nfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-client-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-rbf-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/httpclient-4.5.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/snappy-java-1.0.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/hadoop-auth-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-client-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/hadoop-annotations-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-server-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-framework-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-webapp-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-io-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-http-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/httpcore-4.4.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/zookeeper-3.4.9.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/okio-1.6.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/netty-all-4.0.52.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/gson-2.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-lang3-3.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-util-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-security-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-core-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-util-ajax-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/xz-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-servlet-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/json-smart-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-xml-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-databind-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-io-2.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsch-0.1.54.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-recipes-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-services-api-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-services-core-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-tests-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-api-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-registry-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-router-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/snakeyaml-1.16.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/java-util-1.9.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jersey-client-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/fst-2.50.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/objenesis-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-jaxrs-base-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/json-io-2.5.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/guice-servlet-4.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jersey-guice-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/dnsjava-2.1.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/guice-4.0.jar:/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548213175912_0001/container_e04_1548213175912_0001_01_000001/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:java.compiler= 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-327.28.3.el7.x86_64 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Client environment:user.dir=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548213175912_0001/container_e04_1548213175912_0001_01_000001 2019-01-23 11:16:23,915 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@76508ed1 2019-01-23 11:16:23,933 INFO zookeeper.ClientCnxn: Opening socket connection to server 11.3.221.39/11.3.221.39:2181. Will not attempt to authenticate using SASL (unknown error) 2019-01-23 11:16:23,942 INFO zookeeper.ClientCnxn: Socket connection established to 11.3.221.39/11.3.221.39:2181, initiating session 2019-01-23 11:16:23,961 INFO zookeeper.ClientCnxn: Session establishment complete on server 11.3.221.39/11.3.221.39:2181, sessionid = 0x267dffab8d1000d, negotiated timeout = 40000 2019-01-23 11:16:23,983 INFO state.ConnectionStateManager: State change: CONNECTED 2019-01-23 11:16:24,019 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-01-23 11:16:25,091 INFO xdl.AppMasterBase: ResourceManager client started. 2019-01-23 11:16:25,276 ERROR xdl.AppMasterRunner: run error! org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 'yarn.io/gpu'. Known resources are [name: memory-mb, units: Mi, type: COUNTABLE, value: 8192, minimum allocation: 0, maximum allocation: 9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 4, minimum allocation: 0, maximum allocation: 9223372036854775807] at org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:269) at org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.getResourceInformation(ResourcePBImpl.java:208) at org.apache.hadoop.yarn.api.records.Resource.getResourceValue(Resource.java:306) at org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.getResourceValue(ResourcePBImpl.java:214) at com.alibaba.xdl.AppMasterBase.createRMClient(AppMasterBase.java:1046) at com.alibaba.xdl.AppMasterBase.run(AppMasterBase.java:156) at com.alibaba.xdl.AppMasterRunner.main(AppMasterRunner.java:81) 2019-01-23 11:16:25,299 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 2019-01-23 11:16:25,403 ERROR xdl.AppMasterBase: deal with exit error! 2019-01-23 11:16:25,440 INFO imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting 2019-01-23 11:16:25,475 INFO zookeeper.ZooKeeper: Session: 0x267dffab8d1000d closed 2019-01-23 11:16:25,476 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x267dffab8d1000d

请参考下hadoop官方对于GPU资源的配置,yarn-site.xml和resource-types.xml

bboy-yang commented 5 years ago

感谢!我试试~

bboy-yang commented 5 years ago

前面出现的yarn对gpu的配置问题解决了,但是又遇到这个,想问下大神怎么解决? Showing 4096 bytes. Click here for full log

elease 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: 1 container has reallocated 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: response container Container: [ContainerId: container_e09_1548225183932_0002_01_000047, AllocationRequestId: 0, Version: 0, NodeId: host-11-3-219-10:11068, NodeHttpAddress: host-11-3-219-10:8042, Resource: <memory:4096, vCores:4>, Priority: 5, Token: Token { kind: ContainerToken, service: 11.3.219.10:11068 }, ExecutionType: GUARANTEED, ] 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: response container Container: [ContainerId: container_e09_1548225183932_0002_01_000047, AllocationRequestId: 0, Version: 0, NodeId: host-11-3-219-10:11068, NodeHttpAddress: host-11-3-219-10:8042, Resource: <memory:4096, vCores:4>, Priority: 5, Token: Token { kind: ContainerToken, service: 11.3.219.10:11068 }, ExecutionType: GUARANTEED, ] matches request Request: [role: worker, index: 7, request: Capability[<memory:4000, vCores:4>]Priority[5]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null], failtimes: FailoverTimes [failoverTimes=2], ] 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: response container size 1 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: Completed container container_e09_1548225183932_0002_01_000046 finish state is COMPLETE exit status -100 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: released conatiner container_e09_1548225183932_0002_01_000046 status -100 2019-01-23 14:46:37,653 INFO xdl.AppMasterBase: Completed container container_e09_1548225183932_0002_01_000034 finish state is COMPLETE exit status 126 2019-01-23 14:46:37,653 INFO xdl.AppMasterBase: container_e09_1548225183932_0002_01_000034 work container lost, lose exit status is 126, Launch it again 2019-01-23 14:46:37,653 INFO xdl.AppMasterBase: node worker:8 fail times FailoverTimes [failoverTimes=2] 2019-01-23 14:46:37,660 INFO xdl.AppMasterBase: 1 container has reallocated 2019-01-23 14:46:37,660 INFO xdl.AppMasterBase: response container Container: [ContainerId: container_e09_1548225183932_0002_01_000049, AllocationRequestId: 0, Version: 0, NodeId: host-11-3-221-38:27858, NodeHttpAddress: host-11-3-221-38:8042, Resource: <memory:4096, vCores:4>, Priority: 5, Token: Token { kind: ContainerToken, service: 11.3.221.38:27858 }, ExecutionType: GUARANTEED, ] 2019-01-23 14:46:37,660 INFO xdl.AppMasterBase: response container Container: [ContainerId: container_e09_1548225183932_0002_01_000049, AllocationRequestId: 0, Version: 0, NodeId: host-11-3-221-38:27858, NodeHttpAddress: host-11-3-221-38:8042, Resource: <memory:4096, vCores:4>, Priority: 5, Token: Token { kind: ContainerToken, service: 11.3.221.38:27858 }, ExecutionType: GUARANTEED, ] matches request Request: [role: worker, index: 8, request: Capability[<memory:4000, vCores:4>]Priority[5]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null], failtimes: FailoverTimes [failoverTimes=2], ] 2019-01-23 14:46:39,660 INFO xdl.AppMasterBase: container has failed 21 times, shutdown this application 2019-01-23 14:46:39,662 ERROR xdl.AppMasterRunner: run error! java.lang.RuntimeException: container has failed 21 times,shutdown this application at com.alibaba.xdl.AppMasterBase.processResponse(AppMasterBase.java:399) at com.alibaba.xdl.AppMasterBase.requestFailoverNodes(AppMasterBase.java:524) at com.alibaba.xdl.AppMasterBase.processResponse(AppMasterBase.java:362) at com.alibaba.xdl.AppMasterBase.waitForWorkerFinish(AppMasterBase.java:303) at com.alibaba.xdl.AppMasterBase.run(AppMasterBase.java:182) at com.alibaba.xdl.AppMasterRunner.main(AppMasterRunner.java:81) 2019-01-23 14:46:39,675 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 2019-01-23 14:46:39,867 INFO imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting 2019-01-23 14:46:39,894 INFO zookeeper.ZooKeeper: Session: 0x167dffacc39001e closed 2019-01-23 14:46:39,895 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x167dffacc39001e

yiling-dc commented 5 years ago

前面出现的yarn对gpu的配置问题解决了,但是又遇到这个,想问下大神怎么解决? Showing 4096 bytes. Click here for full log

elease 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: 1 container has reallocated 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: response container Container: [ContainerId: container_e09_1548225183932_0002_01_000047, AllocationRequestId: 0, Version: 0, NodeId: host-11-3-219-10:11068, NodeHttpAddress: host-11-3-219-10:8042, Resource: <memory:4096, vCores:4>, Priority: 5, Token: Token { kind: ContainerToken, service: 11.3.219.10:11068 }, ExecutionType: GUARANTEED, ] 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: response container Container: [ContainerId: container_e09_1548225183932_0002_01_000047, AllocationRequestId: 0, Version: 0, NodeId: host-11-3-219-10:11068, NodeHttpAddress: host-11-3-219-10:8042, Resource: <memory:4096, vCores:4>, Priority: 5, Token: Token { kind: ContainerToken, service: 11.3.219.10:11068 }, ExecutionType: GUARANTEED, ] matches request Request: [role: worker, index: 7, request: Capability[<memory:4000, vCores:4>]Priority[5]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null], failtimes: FailoverTimes [failoverTimes=2], ] 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: response container size 1 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: Completed container container_e09_1548225183932_0002_01_000046 finish state is COMPLETE exit status -100 2019-01-23 14:46:35,588 INFO xdl.AppMasterBase: released conatiner container_e09_1548225183932_0002_01_000046 status -100 2019-01-23 14:46:37,653 INFO xdl.AppMasterBase: Completed container container_e09_1548225183932_0002_01_000034 finish state is COMPLETE exit status 126 2019-01-23 14:46:37,653 INFO xdl.AppMasterBase: container_e09_1548225183932_0002_01_000034 work container lost, lose exit status is 126, Launch it again 2019-01-23 14:46:37,653 INFO xdl.AppMasterBase: node worker:8 fail times FailoverTimes [failoverTimes=2] 2019-01-23 14:46:37,660 INFO xdl.AppMasterBase: 1 container has reallocated 2019-01-23 14:46:37,660 INFO xdl.AppMasterBase: response container Container: [ContainerId: container_e09_1548225183932_0002_01_000049, AllocationRequestId: 0, Version: 0, NodeId: host-11-3-221-38:27858, NodeHttpAddress: host-11-3-221-38:8042, Resource: <memory:4096, vCores:4>, Priority: 5, Token: Token { kind: ContainerToken, service: 11.3.221.38:27858 }, ExecutionType: GUARANTEED, ] 2019-01-23 14:46:37,660 INFO xdl.AppMasterBase: response container Container: [ContainerId: container_e09_1548225183932_0002_01_000049, AllocationRequestId: 0, Version: 0, NodeId: host-11-3-221-38:27858, NodeHttpAddress: host-11-3-221-38:8042, Resource: <memory:4096, vCores:4>, Priority: 5, Token: Token { kind: ContainerToken, service: 11.3.221.38:27858 }, ExecutionType: GUARANTEED, ] matches request Request: [role: worker, index: 8, request: Capability[<memory:4000, vCores:4>]Priority[5]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null], failtimes: FailoverTimes [failoverTimes=2], ] 2019-01-23 14:46:39,660 INFO xdl.AppMasterBase: container has failed 21 times, shutdown this application 2019-01-23 14:46:39,662 ERROR xdl.AppMasterRunner: run error! java.lang.RuntimeException: container has failed 21 times,shutdown this application at com.alibaba.xdl.AppMasterBase.processResponse(AppMasterBase.java:399) at com.alibaba.xdl.AppMasterBase.requestFailoverNodes(AppMasterBase.java:524) at com.alibaba.xdl.AppMasterBase.processResponse(AppMasterBase.java:362) at com.alibaba.xdl.AppMasterBase.waitForWorkerFinish(AppMasterBase.java:303) at com.alibaba.xdl.AppMasterBase.run(AppMasterBase.java:182) at com.alibaba.xdl.AppMasterRunner.main(AppMasterRunner.java:81) 2019-01-23 14:46:39,675 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 2019-01-23 14:46:39,867 INFO imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting 2019-01-23 14:46:39,894 INFO zookeeper.ZooKeeper: Session: 0x167dffacc39001e closed 2019-01-23 14:46:39,895 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x167dffacc39001e

这个表示container已经启动了,具体错误原因需要查看container的日志。

bboy-yang commented 5 years ago

看了container的日志,发现报错这个原因:Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied

以下是详细的log: 2019-01-23 15:54:48,593 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-01-23 15:54:49,438 INFO xdl.ContainerBase: Container workdir is:[/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005] 2019-01-23 15:54:49,438 INFO xdl.ContainerBase: Current xdl app name is: application_1548225183932_0003, config path: config.json 2019-01-23 15:54:49,438 INFO xdl.ContainerBase: Creating XDLJobContainer:[worker] with index[0], XDL app zookeeper address:[/xdl]. 2019-01-23 15:54:49,440 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address 2019-01-23 15:54:49,512 INFO imps.CuratorFrameworkImpl: Starting 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.9-1757313, built on 08/23/2016 06:50 GMT 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:host.name=host-11-3-220-41 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_121 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:java.home=/export/servers/jdk1.8.0_121/jre 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar:/home/hadoop/hadoop-3.1.1/etc/hadoop:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-kms-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-nfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/httpclient-4.5.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/snappy-java-1.0.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/hadoop-auth-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-client-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/hadoop-annotations-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-server-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-framework-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-webapp-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-io-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-http-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/httpcore-4.4.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/zookeeper-3.4.9.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-lang3-3.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-util-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-security-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-core-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-servlet-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/json-smart-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-xml-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-databind-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-io-2.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsch-0.1.54.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-recipes-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-rbf-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-native-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-native-client-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-nfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-client-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-rbf-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/httpclient-4.5.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/snappy-java-1.0.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/hadoop-auth-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-client-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/hadoop-annotations-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-server-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-framework-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-webapp-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-io-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-http-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/httpcore-4.4.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/zookeeper-3.4.9.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/okio-1.6.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/netty-all-4.0.52.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/gson-2.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-lang3-3.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-util-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-security-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-core-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-util-ajax-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/xz-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-servlet-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/json-smart-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-xml-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-databind-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-io-2.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsch-0.1.54.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-recipes-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-services-api-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-services-core-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-tests-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-api-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-registry-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-router-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/snakeyaml-1.16.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/java-util-1.9.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jersey-client-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/fst-2.50.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/objenesis-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-jaxrs-base-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/json-io-2.5.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/guice-servlet-4.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jersey-guice-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/dnsjava-2.1.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/guice-4.0.jar 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:java.compiler= 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-327.28.3.el7.x86_64 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:user.dir=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@57459491 2019-01-23 15:54:49,533 INFO zookeeper.ClientCnxn: Opening socket connection to server 11.3.220.133/11.3.220.133:2181. Will not attempt to authenticate using SASL (unknown error) 2019-01-23 15:54:49,534 INFO zookeeper.ClientCnxn: Socket connection established to 11.3.220.133/11.3.220.133:2181, initiating session 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/17/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/17 to need volume dir list. 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/config.json is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/16/config.json 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/16 to need volume dir list. 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/songyang31.tar.gz is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18/songyang31.tar.gz 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18 to need volume dir list. 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is config.json 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is songyang31.tar.gz 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: Binding /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18 in bindTarVolume 2019-01-23 15:54:49,538 INFO xdl.ContainerBase: XDL docker container start cmd: docker run --expose=8888 -m=4194304000 -w=/home/admin/xdl/songyang31 -t --name=xdl_application_1548225183932_0003_worker_0_000005 --cpu-period=100000 --cpu-quota=400000 -c 4 --net=host -e PYTHONPATH=/home/admin/xdl/python:/home/admin/xdl:/home/admin/xdl/songyang31 -e vp_method=anneal -e meta_dir=hdfs://11.3.220.133:9000/user/recsys/admin/rank/xdl/meta_info --entrypoint=bash -v=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005:/home/admin/xdl -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/17:/export/hadoop/hadoop/tmp/nm-local-dir/filecache/17 -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/16:/export/hadoop/hadoop/tmp/nm-local-dir/filecache/16 -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/18/songyang31.tar.gz/songyang31:/home/admin/xdl/songyang31:rw 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1 -c 'ulimit -c unlimited && source /etc/profile && python deepctr.py --task_name worker --task_index 0 --run_mode dist --zk_addr zfs://11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181/psplus/application_1548225183932_0003 --app_id application_1548225183932_0003 --config ../config.json '; stop cmd: docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005 2019-01-23 15:54:49,538 INFO xdl.ContainerBase: pull image [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] 2019-01-23 15:54:49,587 INFO zookeeper.ClientCnxn: Session establishment complete on server 11.3.220.133/11.3.220.133:2181, sessionid = 0x167dffacc390025, negotiated timeout = 40000 2019-01-23 15:54:49,596 INFO state.ConnectionStateManager: State change: CONNECTED Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:54:49,629 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 91 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:54:49,699 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 161 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:54:49,771 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 233 ms status [1] 2019-01-23 15:54:49,771 INFO xdl.ContainerBase: pull image cost 233 ms docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/create?name=xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied. See 'docker run --help'. 2019-01-23 15:55:09,789 INFO xdl.ContainerBase: stop container cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:09,892 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 102 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:09,985 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 195 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,071 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 282 ms status [1] 2019-01-23 15:55:10,071 INFO xdl.ContainerBase: stop container cost 282 ms 2019-01-23 15:55:10,071 INFO xdl.ContainerBase: rm container cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,157 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 368 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,230 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 441 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,297 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 508 ms status [1] 2019-01-23 15:55:10,298 INFO xdl.ContainerBase: remove container cost 509 ms 2019-01-23 15:55:10,298 INFO xdl.ContainerBase: destroy processor cost 509 ms 2019-01-23 15:55:10,298 ERROR xdl.ContainerBase: Docker container run failed, Exit code is 126 2019-01-23 15:55:10,298 INFO xdl.Utils: ================================================================================================== 2019-01-23 15:55:10,298 INFO xdl.Utils: ===Container local restart 2019-01-23 15:55:10,298 INFO xdl.Utils: ================================================================================================== 2019-01-23 15:55:10,301 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/17/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/17 to need volume dir list. 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/config.json is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/16/config.json 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/16 to need volume dir list. 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/songyang31.tar.gz is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18/songyang31.tar.gz 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18 to need volume dir list. 2019-01-23 15:55:10,303 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 15:55:10,303 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is config.json 2019-01-23 15:55:10,303 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is songyang31.tar.gz 2019-01-23 15:55:10,303 INFO xdl.DockerCmdBuilder: Binding /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18 in bindTarVolume 2019-01-23 15:55:10,303 INFO xdl.ContainerBase: XDL docker container start cmd: docker run --expose=8888 -m=4194304000 -w=/home/admin/xdl/songyang31 -t --name=xdl_application_1548225183932_0003_worker_0_000005 --cpu-period=100000 --cpu-quota=400000 -c 4 --net=host -e PYTHONPATH=/home/admin/xdl/python:/home/admin/xdl:/home/admin/xdl/songyang31 -e vp_method=anneal -e meta_dir=hdfs://11.3.220.133:9000/user/recsys/admin/rank/xdl/meta_info --entrypoint=bash -v=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005:/home/admin/xdl -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/17:/export/hadoop/hadoop/tmp/nm-local-dir/filecache/17 -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/16:/export/hadoop/hadoop/tmp/nm-local-dir/filecache/16 -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/18/songyang31.tar.gz/songyang31:/home/admin/xdl/songyang31:rw 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1 -c 'ulimit -c unlimited && source /etc/profile && python deepctr.py --task_name worker --task_index 0 --run_mode dist --zk_addr zfs://11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181/psplus/application_1548225183932_0003 --app_id application_1548225183932_0003 --config ../config.json '; stop cmd: docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005 2019-01-23 15:55:10,304 INFO xdl.ContainerBase: pull image [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,374 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 71 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,446 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 143 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,527 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 224 ms status [1] 2019-01-23 15:55:10,527 INFO xdl.ContainerBase: pull image cost 224 ms docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/create?name=xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied. See 'docker run --help'. 2019-01-23 15:55:30,530 INFO xdl.ContainerBase: stop container cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:30,634 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 104 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:30,737 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 207 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:30,835 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 305 ms status [1] 2019-01-23 15:55:30,835 INFO xdl.ContainerBase: stop container cost 305 ms 2019-01-23 15:55:30,835 INFO xdl.ContainerBase: rm container cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:30,941 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 411 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:31,028 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 498 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:31,133 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 602 ms status [1] 2019-01-23 15:55:31,133 INFO xdl.ContainerBase: remove container cost 603 ms 2019-01-23 15:55:31,133 INFO xdl.ContainerBase: destroy processor cost 603 ms 2019-01-23 15:55:31,133 ERROR xdl.ContainerBase: Docker container run failed, Exit code is 126

yiling-dc commented 5 years ago

看了container的日志,发现报错这个原因:Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied

以下是详细的log: 2019-01-23 15:54:48,593 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-01-23 15:54:49,438 INFO xdl.ContainerBase: Container workdir is:[/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005] 2019-01-23 15:54:49,438 INFO xdl.ContainerBase: Current xdl app name is: application_1548225183932_0003, config path: config.json 2019-01-23 15:54:49,438 INFO xdl.ContainerBase: Creating XDLJobContainer:[worker] with index[0], XDL app zookeeper address:[/xdl]. 2019-01-23 15:54:49,440 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address 2019-01-23 15:54:49,512 INFO imps.CuratorFrameworkImpl: Starting 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.9-1757313, built on 08/23/2016 06:50 GMT 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:host.name=host-11-3-220-41 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_121 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:java.home=/export/servers/jdk1.8.0_121/jre 2019-01-23 15:54:49,518 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar:/home/hadoop/hadoop-3.1.1/etc/hadoop:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-kms-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/hadoop-nfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/httpclient-4.5.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/snappy-java-1.0.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/hadoop-auth-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-client-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/hadoop-annotations-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-server-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-framework-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-webapp-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-io-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-http-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/httpcore-4.4.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/zookeeper-3.4.9.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-lang3-3.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-util-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-security-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-core-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-servlet-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/json-smart-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jetty-xml-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-databind-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-io-2.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jsch-0.1.54.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/curator-recipes-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-rbf-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-native-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-native-client-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-nfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-client-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-rbf-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/hadoop-hdfs-3.1.1-tests.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/httpclient-4.5.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/snappy-java-1.0.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/hadoop-auth-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-client-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/hadoop-annotations-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-server-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-framework-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-webapp-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-io-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-http-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/httpcore-4.4.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/zookeeper-3.4.9.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/okio-1.6.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/netty-all-4.0.52.Final.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/gson-2.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-lang3-3.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-util-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-security-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-core-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-util-ajax-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/xz-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-servlet-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/json-smart-2.3.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jetty-xml-9.3.19.v20170502.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-databind-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-io-2.5.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jsch-0.1.54.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/curator-recipes-2.12.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-services-api-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-services-core-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-tests-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-api-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-registry-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-client-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-router-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/hadoop-yarn-server-common-3.1.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/snakeyaml-1.16.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/java-util-1.9.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jersey-client-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/fst-2.50.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/objenesis-1.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-jaxrs-base-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/json-io-2.5.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.7.8.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/guice-servlet-4.0.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/jersey-guice-1.19.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/dnsjava-2.1.7.jar:/home/hadoop/hadoop-3.1.1/share/hadoop/yarn/lib/guice-4.0.jar 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:java.compiler= 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-327.28.3.el7.x86_64 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Client environment:user.dir=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005 2019-01-23 15:54:49,519 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@57459491 2019-01-23 15:54:49,533 INFO zookeeper.ClientCnxn: Opening socket connection to server 11.3.220.133/11.3.220.133:2181. Will not attempt to authenticate using SASL (unknown error) 2019-01-23 15:54:49,534 INFO zookeeper.ClientCnxn: Socket connection established to 11.3.220.133/11.3.220.133:2181, initiating session 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/17/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/17 to need volume dir list. 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/config.json is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/16/config.json 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/16 to need volume dir list. 2019-01-23 15:54:49,536 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/songyang31.tar.gz is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18/songyang31.tar.gz 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18 to need volume dir list. 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is config.json 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is songyang31.tar.gz 2019-01-23 15:54:49,537 INFO xdl.DockerCmdBuilder: Binding /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18 in bindTarVolume 2019-01-23 15:54:49,538 INFO xdl.ContainerBase: XDL docker container start cmd: docker run --expose=8888 -m=4194304000 -w=/home/admin/xdl/songyang31 -t --name=xdl_application_1548225183932_0003_worker_0_000005 --cpu-period=100000 --cpu-quota=400000 -c 4 --net=host -e PYTHONPATH=/home/admin/xdl/python:/home/admin/xdl:/home/admin/xdl/songyang31 -e vp_method=anneal -e meta_dir=hdfs://11.3.220.133:9000/user/recsys/admin/rank/xdl/meta_info --entrypoint=bash -v=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005:/home/admin/xdl -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/17:/export/hadoop/hadoop/tmp/nm-local-dir/filecache/17 -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/16:/export/hadoop/hadoop/tmp/nm-local-dir/filecache/16 -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/18/songyang31.tar.gz/songyang31:/home/admin/xdl/songyang31:rw 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1 -c 'ulimit -c unlimited && source /etc/profile && python deepctr.py --task_name worker --task_index 0 --run_mode dist --zk_addr zfs://11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181/psplus/application_1548225183932_0003 --app_id application_1548225183932_0003 --config ../config.json '; stop cmd: docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005 2019-01-23 15:54:49,538 INFO xdl.ContainerBase: pull image [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] 2019-01-23 15:54:49,587 INFO zookeeper.ClientCnxn: Session establishment complete on server 11.3.220.133/11.3.220.133:2181, sessionid = 0x167dffacc390025, negotiated timeout = 40000 2019-01-23 15:54:49,596 INFO state.ConnectionStateManager: State change: CONNECTED Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:54:49,629 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 91 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:54:49,699 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 161 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:54:49,771 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 233 ms status [1] 2019-01-23 15:54:49,771 INFO xdl.ContainerBase: pull image cost 233 ms docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/create?name=xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied. See 'docker run --help'. 2019-01-23 15:55:09,789 INFO xdl.ContainerBase: stop container cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:09,892 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 102 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:09,985 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 195 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,071 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 282 ms status [1] 2019-01-23 15:55:10,071 INFO xdl.ContainerBase: stop container cost 282 ms 2019-01-23 15:55:10,071 INFO xdl.ContainerBase: rm container cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,157 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 368 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,230 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 441 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,297 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 508 ms status [1] 2019-01-23 15:55:10,298 INFO xdl.ContainerBase: remove container cost 509 ms 2019-01-23 15:55:10,298 INFO xdl.ContainerBase: destroy processor cost 509 ms 2019-01-23 15:55:10,298 ERROR xdl.ContainerBase: Docker container run failed, Exit code is 126 2019-01-23 15:55:10,298 INFO xdl.Utils: ================================================================================================== 2019-01-23 15:55:10,298 INFO xdl.Utils: ===Container local restart 2019-01-23 15:55:10,298 INFO xdl.Utils: ================================================================================================== 2019-01-23 15:55:10,301 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/17/xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/17 to need volume dir list. 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/config.json is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/16/config.json 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/16 to need volume dir list. 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: /export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005/songyang31.tar.gz is symbolic link, actual path is: /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18/songyang31.tar.gz 2019-01-23 15:55:10,302 INFO xdl.DockerCmdBuilder: Add /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18 to need volume dir list. 2019-01-23 15:55:10,303 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is xdl-yarn-scheduler-1.0.0-SNAPSHOT-jar-with-dependencies.jar 2019-01-23 15:55:10,303 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is config.json 2019-01-23 15:55:10,303 INFO xdl.DockerCmdBuilder: volumeDirInLocal is songyang31.tar.gz, file is songyang31.tar.gz 2019-01-23 15:55:10,303 INFO xdl.DockerCmdBuilder: Binding /export/hadoop/hadoop/tmp/nm-local-dir/filecache/18 in bindTarVolume 2019-01-23 15:55:10,303 INFO xdl.ContainerBase: XDL docker container start cmd: docker run --expose=8888 -m=4194304000 -w=/home/admin/xdl/songyang31 -t --name=xdl_application_1548225183932_0003_worker_0_000005 --cpu-period=100000 --cpu-quota=400000 -c 4 --net=host -e PYTHONPATH=/home/admin/xdl/python:/home/admin/xdl:/home/admin/xdl/songyang31 -e vp_method=anneal -e meta_dir=hdfs://11.3.220.133:9000/user/recsys/admin/rank/xdl/meta_info --entrypoint=bash -v=/export/hadoop/hadoop/tmp/nm-local-dir/usercache/admin/appcache/application_1548225183932_0003/container_e09_1548225183932_0003_01_000005:/home/admin/xdl -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/17:/export/hadoop/hadoop/tmp/nm-local-dir/filecache/17 -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/16:/export/hadoop/hadoop/tmp/nm-local-dir/filecache/16 -v=/export/hadoop/hadoop/tmp/nm-local-dir/filecache/18/songyang31.tar.gz/songyang31:/home/admin/xdl/songyang31:rw 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1 -c 'ulimit -c unlimited && source /etc/profile && python deepctr.py --task_name worker --task_index 0 --run_mode dist --zk_addr zfs://11.3.220.133:2181,11.3.221.39:2181,11.3.219.41:2181/psplus/application_1548225183932_0003 --app_id application_1548225183932_0003 --config ../config.json '; stop cmd: docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005 2019-01-23 15:55:10,304 INFO xdl.ContainerBase: pull image [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,374 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 71 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,446 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 143 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/images/create?fromImage=172.20.189.139%2Fxdl%2Fxdl&tag=ubuntu-cpu-tf1.12-hadoop3.1.1: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:10,527 INFO xdl.ContainerBase: cmd [docker pull 172.20.189.139/xdl/xdl:ubuntu-cpu-tf1.12-hadoop3.1.1] cost 224 ms status [1] 2019-01-23 15:55:10,527 INFO xdl.ContainerBase: pull image cost 224 ms docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/create?name=xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied. See 'docker run --help'. 2019-01-23 15:55:30,530 INFO xdl.ContainerBase: stop container cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:30,634 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 104 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:30,737 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 207 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005/stop?t=30: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:30,835 INFO xdl.ContainerBase: cmd [docker stop -t 30 xdl_application_1548225183932_0003_worker_0_000005] cost 305 ms status [1] 2019-01-23 15:55:30,835 INFO xdl.ContainerBase: stop container cost 305 ms 2019-01-23 15:55:30,835 INFO xdl.ContainerBase: rm container cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:30,941 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 411 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:31,028 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 498 ms status [1] Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/xdl_application_1548225183932_0003_worker_0_000005: dial unix /var/run/docker.sock: connect: permission denied 2019-01-23 15:55:31,133 INFO xdl.ContainerBase: cmd [docker rm xdl_application_1548225183932_0003_worker_0_000005] cost 602 ms status [1] 2019-01-23 15:55:31,133 INFO xdl.ContainerBase: remove container cost 603 ms 2019-01-23 15:55:31,133 INFO xdl.ContainerBase: destroy processor cost 603 ms 2019-01-23 15:55:31,133 ERROR xdl.ContainerBase: Docker container run failed, Exit code is 126

请确认启动yarn的用户在docker的user group中