Open DayanJ opened 6 years ago
What are your:
OS version, CUDA version, Python version, GPU, and have you got tensorflow gpu or cpu installed (or both)?
You can track this issue here, also: https://github.com/Rayhane-mamah/Tacotron-2/issues/73 https://github.com/Rayhane-mamah/Tacotron-2/issues/87
I have the same issue when I try to run this on CPU.
@DayanJ According to your log, the device placement is CPU, and, I guess, CPU version of Op supports only NHWC order.
If you're going to use your GPU, you should fix whatever prevents TensorFlow from placing your OP to GPU, probably by uninstalling CPU version of TensorFlow.
If you're going to use CPU, I guess you can fastfix this by reordering before this Op using tf.transpose
like this https://stackoverflow.com/questions/37689423/convert-between-nhwc-and-nchw-in-tensorflow
Upd: tried fastfix, didn't work, I am TensorFlow noob, don't believe me. Upd2: something like that, probably. https://github.com/gloriouskilka/Tacotron-2-fork/commit/a56b3007e109fae439ec752d9a9ef5384732c789
Hello @DayanJ, as suggested by @gloriouskilka, please make sure you only have tensorflow gpu version installed. This is most likely a bug that occurs when you are trying to use CPU on Wavenet.
Hi, @gloriouskilka , I am DayanJ, this is my new account. I didn't install tf gpu before. I had tried for this advice( gloriouskilka@a56b300). It works for only using cpu. @Rayhane-mamah @DanRuta @gloriouskilka Thanks for your help.
@Hayes515 Hi! My fastfix is a bad idea, just proof of concept. You should switch to Nvidia GPU, if you have one, because you will train your network on CPU until the end of the days, I think.
Usually people install both tensorflow and tensorflow-gpu, and sometimes CPU version of tensorflow prevents GPU to be used, so the main advice is: uninstall tensorflow, install only tensorflow-gpu.
@gloriouskilka Hi! you are right,I have switched to Nvidia GPU, but it took me two days to finish it. I installed some packages by Anaconda3 in a new environment T3.This way is convinient.
The condition of my GPU is below.
Thank you!
@Hayes515 Yay! You're welcome!
I guess we can close this issue, because it alredy contains all possible solutions with nice screenshots.
One last thing before closing this, @Hayes515 you may want to keep your 2nd gpu free as it is holding the model graph for no particular reason. To do this, please add os.environ["CUDA_VISIBLE_DEVICES"] = "0" in the following location: https://github.com/Rayhane-mamah/Tacotron-2/blob/e244457bf5a9fa8d308e97d61cf3dc1933575488/train.py#L36-L37
That will prevent the run from seeing your 2nd GPU, it seems your graphic display is handled by it so there you go :) Naturally if you want to make multiple runs in parallel you can follow my comment here.
Feel free to close the issue if no other problems are related to this issue. Thanks for using our work ;)
I needed to run the model on CPU for a testing purpose (because a machine with GPU is currently occupied by another variation of this model) so I would be glad if it could run on CPU.
It looks like the "channel" part of the transposed convolution input is temporarily inserted here:
And here:
I guess that this issue, the restriction of the CPU implementation of Conv2DTranspose
, can be worked around by inserting a new dimension as the last dimension (axis=3
, NHWC
) instead of as the second dimension (axis=1
, NCHW
). (Also don't forget to change data_format
to channels_last
)
I'm not entirely sure because this is based on the assumption that these are the only instances where Conv2DTranspose
is used and I haven't gotten used to this code base yet. Also I'm not sure how this is fundamentally different from "fastfix"s mentioned by @gloriouskilka. I would really appreciate if someone could confirm if this is the right way to go.
Hi guys,
I'm trying to get WaveNet training working and I keep getting this problem. I can't find its location and don't know how to fix it. Or is there an update on this issue? I only have tansorflow-gpu installed. Tacotron workout went through without any problems.
Exiting due to exception: Conv2DCustomBackpropInputOp only supports NHWC. [[node WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/Conv2DBackpropInput (defined at /notebooks/Tacotron-2/wavenet_vocoder/models/wavenet.py:557) = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/ShapeN, WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/ExpandDims_1, WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Squeeze_grad/Reshape)]]
Caused by op 'WaveNet_model/optimizer_1/gradients/WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D_grad/Conv2DBackpropInput', defined at:
File "train.py", line 138, in
...which was originally created as op 'WaveNet_model/inference/final_convolution_2/final_convolution_2_1/final_convolution_2/conv1d/Conv2D', defined at:
File "train.py", line 138, in
I used LJSpeech-1.1 data to test 1.After I have run 'Python3 wavenet_preprocess.py', I can get these files. 2.I have modifed 'hparams.py' , set "train_with_GTA" to False. 3.After I have run "Python3 train --model='WaveNet' ,I got these errors.
My tensorflow version is 1.7.1 and I can't fix this error.