Qihoo360 / hbox

AI on Hadoop
Apache License 2.0
1.73k stars 385 forks source link

x-learning 在两个人同时执行demo时,最后报错 #56

Closed sydpz closed 5 years ago

sydpz commented 5 years ago

运行的 example 为: xlearning/examples/tensorflow/run.sh 任务在执行至 95% 时报错, 在工作节点上看,报的是 目录权限不对, Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=bing.wb, access=WRITE, inode="/tmp/XLearning/eventLog":xxxxxxx:supergroup:drwxr-xr-x xxxxxx 是之前一个同事运行命令后创建的目录, 导致当前我的任务执行失败。 但是我在运行demo 执行,已经对 eventLog 进行了重定向,目前看这个改动貌似没有生效。

[bing.wb@e92l09627.em21 /home/bing.wb/xlearning/conf]
$grep -b2  event xlearning-site.xml
1450-    <property>
1465-        <name>xlearning.tf.board.history.dir</name>
1517:        <value>/tmp/bing.wb/XLearning/eventLog</value>
1572-    </property>
1588-    <property>
jiarunying commented 5 years ago

麻烦在提交脚本里指定: --conf xlearning.tf.board.history.dir=/tmp/bing.wb/XLearning/eventLog

sydpz commented 5 years ago

感谢