Open pltrdy opened 7 years ago
@pltrdy, try to change the follow line https://github.com/abisee/pointer-generator/blob/master/run_summarization.py#L125 to default_device = tf.device('/gpu:0')
Thx @pedrobalage, I actually removed this block. I don't really understand why the device is hardcoded. Shouldn't it be left "default" (and then user can (un/)set CUDA_VISIBLE_DEVICE
) or at least a parameter?
@pltrdy, I didn't understand this decision too. This was the only way it worked for me. I also setup the CUDA_VISIBLE_DEVICE.
@pedrobalage thanks for sharing it, I didn't even noticed this line.
@abisee I forked, switched to python3 syntax & removed this line. My python 3 implementation wont be compatible because of bytes
vs str
differences so I'm not sure a PR would be welcomed for that.
https://github.com/pltrdy/pointer-generator
thanks for pointing this out @pltrdy removing that line allows the code to run on the gpu and it's going 20x faster. so you pretty much made my life go by 20 times faster 👍
Another related discussion can be found in #3
@ioana-blue In fact I just complained then @pedrobalage helped us ;)
@tianjianjiang thx for referencing it. Interesting to notice that it's the same GPU. Can't figure out how this may work on K40 (used by the author) and not on 1080 since there is enough RAM. Maybe CUDA/TF version have some effect here. No idea.
@pltrdy You're welcome and I can also confirm that changing it from CPU to GPU can be worse on K80 and GT 650M.
BTW, there's a recent post about this topic https://datascience.stackexchange.com/questions/19220/choosing-between-cpu-and-gpu-for-training-a-neural-network Also my GT 650M and K80 experience is usually a support to not use slow GPU with small RAM, and to be careful with GPU in virtualization (i.e. AWS EC2 and Google Cloud).
Hi everyone,
I'm afraid this will be a rather unsatisfactory answer, but the reason that we set default device to CPU is that our code is derived from the TextSum code, which does the same thing (see this line). Similarly, we have a few
with tf.device("/gpu:0"):
lines in model.py
that are derived from similar lines in this file.
Personally I'm not sure what is the rationale for doing this (I think I tried removing these lines and didn't see a difference). But that's the code we ran for our experiments so we left it like that for reproducibility, in case it matters.
Edit: When we ran on our K40, the GPU-Util was non-zero so the GPU was being used.
I'm new to everything related to this work, but learning. From what I read in the docs and looking at the code, this is my understanding.
https://www.tensorflow.org/tutorials/using_gpu
explains that with tf.device
will try to impose to use a certain device for certain computations, instead of the default device chosen by tf (which prioritizes gpus over cpus as far as I understand). if the device specified cannot be allocated, you get an exception, UNLESS you use allow_soft_placement=True
, which our code uses: https://github.com/abisee/pointer-generator/blob/master/util.py#L26
That's why you can see various behavior depending on where you run the code, what's running concurrently, etc.
Hope this explains some of the mystery.
I'm not entirely sure if I was able to make it to run on GPU. @ioana-blue @abisee @pltrdy @pedrobalage @tianjianjiang How long is a training step taking you approximately? For me every training step is about 1.5 seconds, is that the speed you're getting on GPU or the CPU?
@Spandan-Madan I'm around 1.8sec/step, default parameters, nvidia 1080. The GPU usage might be sub-optimal as we can see it is not always 100% of load.
It's better to switch to GPU, since the default setting would make the training extremely slow.
Strange fact. I've been trying to train summarization with default parameters and default dataset.
It turns out that the GPU is detected (
2017-06-01 17:57:05.548650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
), some GPU memory is alocated for the process, but nothing is running on it (load: 0%). It sometimes peak at around 27% for very short time.The GPU is a single 1080 with 8G RAM (4617MiB allocated by the process)
Any idea?