Qihoo360 / hbox

AI on Hadoop
Apache License 2.0
1.73k stars 385 forks source link

环境变量超过最大长度该如何处理? #66

Closed daniel985 closed 5 months ago

daniel985 commented 5 years ago

报错如下: 19/02/21 17:42:52 INFO XLearningContainer: Cluster def is: {"ps":["node38.hadoop.com:28249"],"worker":["node36.hadoop.com:24129","node5.hadoop.com:25401","node11.hadoop.com:26075","node18.hadoop.com:21271","node62.hadoop.com:29775"]} 19/02/21 17:42:52 WARN XLearningContainer: Current container environments length 576369 exceed the configuration xlearning.env.maxlength 102400 19/02/21 17:42:52 WARN XLearningContainer: InputFile list had written to local file: inputFileList.txt !! 19/02/21 17:42:52 INFO XLearningContainer: Executing command:bash -x dist_train.sh 194063 ./tfmodel 19/02/21 17:42:52 ERROR XLearningContainer: Some errors has occurred during container running! java.io.IOException: Cannot run program "bash": error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at java.lang.Runtime.exec(Runtime.java:620) at java.lang.Runtime.exec(Runtime.java:450) at java.lang.Runtime.exec(Runtime.java:388) at net.qihoo.xlearning.container.XLearningContainer.run(XLearningContainer.java:673) at net.qihoo.xlearning.container.XLearningContainer.main(XLearningContainer.java:983) Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 5 more 貌似是文件part太多,导致传的环境变量太大,请问该如何处理?

daniel985 commented 5 years ago

修改了xlearning.env.maxlength,但是仍然会报Cannot run program "bash": error=7, Argument list too long的错误

liyuance commented 5 years ago

这个受系统环境的限制,已经有了解决办法,最近先会提交个PR