GPU-Training supported?

songpu2015617 commented 5 years ago

Dear Authors:

I want to know whether this code support GPU trainning? I tried install tensorflow-gpu with latest version and also 0.12.1 version. I will get errors tensor shapes doesn't match. I want to know whether you got the same error and how to fix it? Thanks

Bartzi commented 5 years ago

Well, our code should put your data automatically on your GPU.... What kind of errors do you get? I can not help you, if you are not providing the exact error you are encountering ;)

songpu2015617 commented 5 years ago

Hi, Bartzi: Thanks for your reply. I installed tensorflow-gpu==1.12.0, I got the following error: Logging to logs/2018-11-19-17-25-30 Traceback (most recent call last): File "train.py", line 88, in model_file_name = train(cli_args, log_dir) File "train.py", line 38, in train model = model_class.create_model(train_data_generator.get_input_shape(), config) File "/home/pu.song/Documents/ASRDev/LID/crnn-lid/keras/models/topcoder_crnn_finetune.py", line 54, in create_model model.add(Bidirectional(LSTM(512, return_sequences=False), merge_mode="concat")) File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/models.py", line 324, in add output_tensor = layer(self.outputs[0]) File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/engine/topology.py", line 491, in call self.build(input_shapes[0]) File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/layers/wrappers.py", line 218, in build self.forward_layer.build(input_shape) File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/layers/recurrent.py", line 733, in build self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o]) File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 753, in concatenate return tf.concat(axis, [to_dense(x) for x in tensors]) File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1122, in concat tensor_shape.scalar()) File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (4, 256, 512) and () are incompatible When I install tensorflow-gpu==0.12.1 I got the following errors: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate (GHz) 1.6705 pciBusID 0000:03:00.0 Total memory: 10.91GiB Free memory: 9.81GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0) WARNING:tensorflow:From /home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/callbacks.py:517 in _set_model.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30. Instructions for updating: Please switch to tf.summary.merge_all. WARNING:tensorflow:From /home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/callbacks.py:521 in _set_model.: init (from tensorflow.python.training.summary_io) is deprecated and will be removed after 2016-11-30. Instructions for updating: Please switch to tf.summary.FileWriter. The interface and behavior is the same; this is just a rename. Epoch 1/50 E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration. F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) Aborted (core dumped) It seems your keras code will eat up all the GPU memory very quickly. Thanks.

Bartzi commented 5 years ago

Our code is not eating all the available memory that is a problem of tensorflow, as tensorflow always allocates all available memory...

Let's have a look at your problems:

Tensorflow 1.12.0: it seems that the data loader does not supplt the correct data format... are you using the correct data?
tensorflow 0.12.1: You have a newer CuDNN library installed than expected by the library, the program tells you this:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

So you'll either have to compile the old tensorflow version by yourself or install a different version of CuDNN, or use a more modern version.

songpu2015617 commented 5 years ago

Thank you. I got it worked on my machine.

nikhil031294 commented 4 years ago

@songpu2015617 Can you please tell me how did it work in your machine? Also, what are your cuDNN and CUDA versions?

Thanks

Arafat4341 commented 4 years ago

Hello everyone! I am using google colab for training. I enabled GPU but the GPU is not utilized. I get message from colab: You are not utilizing GPU runtime, please switch to standard runtime

How can I make this code utilize GPU of colab?!

bytosaur commented 3 years ago

@nikhil031294

I used Ubuntu 16.04
disabled the nouveau driver and used the shipped NVIDIA driver (384.130)
installed cuda 8.0 via runfile (https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html) but did not update the driver
then downloaded cuDNN 5.1 for CUDA 8.0 (https://developer.nvidia.com/rdp/cudnn-archive) and moved it to /usr/local/cuda-8.0/lib64 and the header to /usr/local/cuda-8.0/include
set the paths: -- $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH -- $ export PATH=/usr/local/cuda-8.0/bin:$PATH
cloned the repo and replaced tensorflow==0.12.1 with tensorflow-gpu==0.12.1 in requirements.txt before installing

you might want to look in here:(https://chromium.googlesource.com/external/github.com/tensorflow/tensorflow/+/refs/heads/r0.12/tensorflow/g3doc/get_started/os_setup.md)

HPI-DeepLearning / crnn-lid

GPU-Training supported? #9