PRBonn / bonnet

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.
GNU General Public License v3.0
323 stars 89 forks source link

Error in cnn_train.py: pywrap_tensorflow.list_devices() could not find GPU #25

Closed mengyuest closed 6 years ago

mengyuest commented 6 years ago

Hi~ I was trying to run cnn_train.py following the instructions provided in ReadMe and encountered the problem shown as follows

meng@meng:~/foo/bar/bonnet/train_py$ ./cnn_train.py -d cfg/persons/data.yaml -n cfg/persons/net_bonnet_inception.yaml  -t cfg/persons/train_bonnet_inception.yaml -l cfg/persons/logs/
/usr/local/lib/python3.5/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
----------
INTERFACE:
data yaml:  cfg/persons/data.yaml
net yaml:  cfg/persons/net_bonnet_inception.yaml
train yaml:  cfg/persons/train_bonnet_inception.yaml
log dir cfg/persons/logs/
model path None
model type iou
----------

Commit hash (training version):  b'2b24767'
----------

Opening desired data file cfg/persons/data.yaml
Opening desired net file cfg/persons/net_bonnet_inception.yaml
Opening desired train file cfg/persons/train_bonnet_inception.yaml
Copying files to cfg/persons/logs/ for further reference.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Training from scratch
Fetching dataset
Training with 1 GPU's
Training with batch size 36
DEVICE AVAIL:  /device:CPU:0
Number of GPU's available is 0
Traceback (most recent call last):
  File "./cnn_train.py", line 186, in <module>
    net.train()
  File "/home/meng/foo/bar/bonnet/train_py/arch/abstract_net.py", line 1019, in train
    assert(self.n_gpus == self.n_gpus_avail)
AssertionError
meng@meng:~/foo/bar/bonnet/train_py$ 

I test the function pywrap_tensorflow.list_devices() used in device_lib.list_local_devices() for self.gpu_available() in console and found out it printed

[b'\n\r/device:CPU:0\x12\x03CPU \x80\x80\x80\x80\x01*\x001\xe1B:\\\\\x8bgf']

And after conversion, it became

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8982061213019183087
]

It seemed that it could not find GPU. How can I fix this? Thanks~

tano297 commented 6 years ago

Hi,

This seems to be tensorflow related, not bonnet related. Try running $ nvidia-smi, if it returns a GPU, then try

$ sudo pip3 install --upgrade pip
$ sudo pip3 install --upgrade tensorflow-gpu

If these things don't work, please refer to the tensorflow stackoverflow