Error in Train.py: I've been trying to figure it out for an hour now

milkmilkMilktea commented 5 years ago

So everything runs fine until the line: batch_loss, batch_acc = model.train_on_batch([title_batch, past_batch], [pred_batch])

I figured it was a syntax error or something so I did some research. Although I've didn't see it used this way, there was nothing wrong with the syntax. So I updated all of my Libraries and tried it again. Still the same error. No matter what I did, I couldn't get it to work. If anyone cares, or is even out there (this is not a maintained repository so maybe I'm alone), your help would be immensely appreciated.

Here's the full error log. pls help me

Traceback (most recent call last): File "C:\Users(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call return fn(*args) File "C:\Users(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1305, in _run_fn self._extend_graph() File "C:\Users(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNN' with these attrs. Registered devices: [CPU], Registered kernels:
[[Node: cu_dnngru_1/CudnnRNN = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=87654321, seed2=0](cu_dnngru_1/transpose, cu_dnngru_1/ExpandDims_1, cu_dnngru_1/Const_1, cu_dnngru_1/concat)]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:/Users/(Censored)/PycharmProjects/YouTubeCommenter/Train.py", line 170, in batch_loss, batch_acc = model.train_on_batch([title_batch, past_batch], [pred_batch]) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1449, in train_on_batch outputs = self.train_function(ins) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2947, in __call__ if hasattr(get_session(), '_make_callable_from_options'): File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 204, in get_session [tf.is_variable_initialized(v) for v in candidate_vars]) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 900, in run run_metadata_ptr) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run run_metadata) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNN' with these attrs. Registered devices: [CPU], Registered kernels: [[Node: cu_dnngru_1/CudnnRNN = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=87654321, seed2=0](cu_dnngru_1/transpose, cu_dnngru_1/ExpandDims_1, cu_dnngru_1/Const_1, cu_dnngru_1/concat)]] Caused by op 'cu_dnngru_1/CudnnRNN', defined at: File "C:/Users/(Censored)/PycharmProjects/YouTubeCommenter/Train.py", line 94, in x = CuDNNGRU(200, return_sequences=USE_OUT_SEQ)(x) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\layers\recurrent.py", line 533, in __call__ return super(RNN, self).__call__(inputs, **kwargs) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\base_layer.py", line 450, in __call__ output = self.call(inputs, **kwargs) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\layers\cudnn_recurrent.py", line 90, in call output, states = self._process_batch(inputs, initial_state) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\layers\cudnn_recurrent.py", line 297, in _process_batch is_training=True) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\cudnn_rnn\python\ops\cudnn_rnn_ops.py", line 1621, in __call__ seed=self._seed) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\cudnn_rnn\python\ops\cudnn_rnn_ops.py", line 1010, in _cudnn_rnn_no_input_c direction, dropout, seed, name) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\cudnn_rnn\python\ops\cudnn_rnn_ops.py", line 922, in _cudnn_rnn outputs, output_h, output_c, _ = gen_cudnn_rnn_ops.cudnn_rnn(**args) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_cudnn_rnn_ops.py", line 143, in cudnn_rnn is_training=is_training, name=name) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op op_def=op_def) File "C:\Users\(Censored)\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CudnnRNN' with these attrs. Registered devices: [CPU], Registered kernels: [[Node: cu_dnngru_1/CudnnRNN = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=87654321, seed2=0](cu_dnngru_1/transpose, cu_dnngru_1/ExpandDims_1, cu_dnngru_1/Const_1, cu_dnngru_1/concat)]]

HackerPoet commented 5 years ago

This network can only be trained on a GPU because it uses CuDNNGRU, the error sounds like you're trying to run it on a CPU. You may be able to fix the problem by replacing that layer with a regular GRU.

milkmilkMilktea commented 5 years ago

Thank you so so much @HackerPoet how would I run it on my GPU? In the past I've just used Google Colab, but there must be another way?

milkmilkMilktea commented 5 years ago

Okay, I did a ton of research and found a tutorial. I'll make sure to close the issue if it works

milkmilkMilktea commented 5 years ago

woops I meant to link something, guess it didn't work https://www.codingforentrepreneurs.com/blog/install-tensorflow-gpu-windows-cuda-cudnn/

HackerPoet commented 5 years ago

Installing CUDA and cuDNN is best because it will train significantly faster. But if you can't get it working, just replace the instance of 'CuDNNGRU' with 'GRU'.

milkmilkMilktea commented 5 years ago

After installing 5 different versions of CUDA and cuDNN, I just gave up and used GRU and it worked great! By chance could you tell me what versions you use for future reference?

Ajeet-Yadav commented 4 years ago

Make sure you have Nvidia graphics card as CUDA & cuDNN are Nvidia's library.

HackerPoet / YouTubeCommenter

Error in Train.py: I've been trying to figure it out for an hour now #3