when i check xlearning gpu example like this:
$XLEARNING_HOME/bin/xl-submit \ --app-type "tensorflow" \ --app-name "tf-demo" \ --input /tmp/data/tensorflow#data \ --output /tmp/tensorflow_model#model \ --files demo.py,dataDeal.py \ --launch-cmd "python demo.py --data_path=./data --save_path=./model --log_dir=./eventLog --training_epochs=10" \ --worker-memory 8G \ --worker-num 1 \ --worker-cores 1 \ --worker-gcores 1 \ --ps-memory 1G \ --ps-num 1 \ --ps-cores 1 \ --queue default \
the result is:
but, when i check hadoop gpu example like this:
yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \ -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \ -shell_command /usr/local/nvidia/bin/nvidia-smi \ -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2 \ -num_containers 2
the result is:
centos7.6
xlearning-gpu-1.3
hadoop3.1.0
when i check xlearning gpu example like this:
$XLEARNING_HOME/bin/xl-submit \ --app-type "tensorflow" \ --app-name "tf-demo" \ --input /tmp/data/tensorflow#data \ --output /tmp/tensorflow_model#model \ --files demo.py,dataDeal.py \ --launch-cmd "python demo.py --data_path=./data --save_path=./model --log_dir=./eventLog --training_epochs=10" \ --worker-memory 8G \ --worker-num 1 \ --worker-cores 1 \ --worker-gcores 1 \ --ps-memory 1G \ --ps-num 1 \ --ps-cores 1 \ --queue default \
the result is:but, when i check hadoop gpu example like this:
yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \ -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \ -shell_command /usr/local/nvidia/bin/nvidia-smi \ -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2 \ -num_containers 2
the result is: