Skuldur / Classical-Piano-Composer

MIT License
602 stars 318 forks source link

Can't Run with Tensorflow-gpu #37

Open octagonalsquare opened 4 years ago

octagonalsquare commented 4 years ago

I have a RTX 2070 and so I want to make sure it runs on my GPU since it will be significantly faster than cpu. But, after installing tensorflow-gpu, CUDA, and all other dependencies listed on the tensorflow website, it throws back this:

Traceback (most recent call last):
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
    return fn(*args)
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\client\session.py", line 1339, in _run_fn
    self._extend_graph()
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\client\session.py", line 1374, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNN' used by {{node cu_dnnlstm_1/CudnnRNN}}with these attrs: [seed=87654321, dropout=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", is_training=true, seed2=0]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

     [[cu_dnnlstm_1/CudnnRNN]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Users\chhou\Desktop\Classical-Piano-Composer-master\lstm.py", line 125, in <module>
    train_network()
  File "D:\Users\chhou\Desktop\Classical-Piano-Composer-master\lstm.py", line 27, in train_network
    train(model, network_input, network_output)
  File "D:\Users\chhou\Desktop\Classical-Piano-Composer-master\lstm.py", line 122, in train
    model.fit(network_input, network_output, epochs=200, batch_size=128, callbacks=callbacks_list)
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\engine\training.py", line 1213, in fit
    self._make_train_function()
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\engine\training.py", line 333, in _make_train_function
    **self._function_kwargs)
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\backend\tensorflow_backend.py", line 3006, in function
    v1_variable_initialization()
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\backend\tensorflow_backend.py", line 420, in v1_variable_initialization
    session = get_session()
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\backend\tensorflow_backend.py", line 385, in get_session
    return tf_keras_backend.get_session()
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\keras\backend.py", line 462, in get_session
    _initialize_variables(session)
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\keras\backend.py", line 879, in _initialize_variables
    [variables_module.is_variable_initialized(v) for v in candidate_vars])
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\client\session.py", line 950, in run
    run_metadata_ptr)
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\client\session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
    run_metadata)
  File "C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNN' used by node cu_dnnlstm_1/CudnnRNN (defined at C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\layers\cudnn_recurrent.py:517) with these attrs: [seed=87654321, dropout=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", is_training=true, seed2=0]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

     [[cu_dnnlstm_1/CudnnRNN]]

Errors may have originated from an input operation.
Input Source operations connected to node cu_dnnlstm_1/CudnnRNN:
 cu_dnnlstm_1/transpose (defined at C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\layers\cudnn_recurrent.py:484)   
 cu_dnnlstm_1/ExpandDims_1 (defined at C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\layers\cudnn_recurrent.py:487)    
 cu_dnnlstm_1/ExpandDims_2 (defined at C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\layers\cudnn_recurrent.py:488)    
 cu_dnnlstm_1/concat_1 (defined at C:\Users\chhou\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\keras\layers\cudnn_recurrent.py:60)

I don't really know what to make of this as it seems to be coming from files within the various packages required by the composer.

NOTE: This is with tensorflow-gpu 1.14 as most forums said to downgrade to that if you recieve the ModuleNotFoundError: No module named 'tensorflow.contrib error, which i was getting with the latest version.

duhaime commented 4 years ago

@octagonalsquare It looks like tensorflow-gpu was only able to find your CPU device, not your NVIDIA card. Do you get output when you run nvidia-smi? If not, either you don't have an NVIDIA card or something is misconfigured...

Skuldur commented 4 years ago

Hi,

Which version of CUDA do you have installed? It's likely that the version you have is incompatible with Tensorflow 1.14.

octagonalsquare commented 4 years ago

@octagonalsquare It looks like tensorflow-gpu was only able to find your CPU device, not your NVIDIA card. Do you get output when you run nvidia-smi? If not, either you don't have an NVIDIA card or something is misconfigured...

yes I get output. I was able to get it to detect my gpu using tf.config.list_physical_devices('XLA_GPU') but I don't know what to do with it at that point.

Hi,

Which version of CUDA do you have installed? It's likely that the version you have is incompatible with Tensorflow 1.14.

not sure how to check and I don't remember what it was when i installed it