Qihoo360 / hbox

AI on Hadoop
Apache License 2.0
1.73k stars 384 forks source link

Docker跑tf.estimastor hdfs问题 #71

Closed itscarrot closed 3 months ago

itscarrot commented 5 years ago

不用docker提交可以正常运行,用docker提交出现错误。

启动命令如下:

$XLEARNING_HOME/bin/xl-submit \ --app-type "tensorflow" \ --app-name "tf-estimator-demo" \ --files demo.py \ --launch-cmd "python demo.py --data_path=hdfs://nameservicestream/model/tmp/data/tfEstimator --model_path=hdfs://nameservicestream/tmp/estimatorDemoModel" \ --worker-memory 2G \ --worker-num 3 \ --worker-cores 2 \ --ps-memory 2G \ --ps-num 1 \ --ps-cores 2 \ --tf-evaluator true \ --queue default \ --conf xlearning.container.type=docker \ --conf xlearning.docker.image=docker.io/tensorflow/tensorflow:1.12.0-devel-py3 \ --conf xlearning.docker.worker.dir=work

错误信息如下:

19/08/22 19:03:23 INFO XLearningContainer: Use tf.data instead. 19/08/22 19:03:23 INFO XLearningContainer: Environment variable CLASSPATH not set! 19/08/22 19:03:23 INFO XLearningContainer: getJNIEnv: getGlobalJNIEnv failed 19/08/22 19:03:23 INFO XLearningContainer: Traceback (most recent call last): 19/08/22 19:03:23 INFO XLearningContainer: File "demo.py", line 134, in 19/08/22 19:03:23 INFO XLearningContainer: tf.app.run(main=main) 19/08/22 19:03:23 INFO XLearningContainer: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run 19/08/22 19:03:23 INFO XLearningContainer: _sys.exit(main(argv)) 19/08/22 19:03:23 INFO XLearningContainer: File "demo.py", line 52, in main 19/08/22 19:03:23 INFO XLearningContainer: features_dtype=np.float32) 19/08/22 19:03:23 INFO XLearningContainer: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 306, in new_func 19/08/22 19:03:23 INFO XLearningContainer: return func(*args, *kwargs) 19/08/22 19:03:23 INFO XLearningContainer: File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 53, in load_csv_with_header 19/08/22 19:03:23 INFO XLearningContainer: header = next(data_file) 19/08/22 19:03:24 INFO XLearningContainer: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 214, in next 19/08/22 19:03:24 INFO XLearningContainer: retval = self.readline() 19/08/22 19:03:24 INFO XLearningContainer: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 183, in readline 19/08/22 19:03:24 INFO XLearningContainer: self._preread_check() 19/08/22 19:03:24 INFO XLearningContainer: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 85, in _preread_check 19/08/22 19:03:24 INFO XLearningContainer: compat.as_bytes(self.__name), 1024 512, status) 19/08/22 19:03:24 INFO XLearningContainer: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit 19/08/22 19:03:24 INFO XLearningContainer: c_api.TF_GetCode(self.status.status)) 19/08/22 19:03:24 INFO XLearningContainer: tensorflow.python.framework.errors_impl.NotFoundError: Unknown error 255 19/08/22 19:03:27 ERROR XLearningContainer: XLearningContainer run failed!

jiayuhan-it commented 3 months ago

如有需要请重提issue