Open soheilsh7 opened 2 years ago
So, apparently there is a problem with tensorflow and nvidia 30 series GPUs. Im training the model with the same parameters in another environment with tensorflow-cpu==1.15 and it works fine Though I still dont know how to solve the mentioned problem with tensorflow-gpu
I am trying to train a model using the fallowing command :
python3.6 -m learning_to_simulate.train --data_path=./learning_to_simulate/tmp/datasets/WaterDrop/ --model_path=./learning_to_simulate/tmp/models/WaterDrop
and I get the fallowing error :
2022-11-10 13:53:37.015640: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 37778160 exceeds 10% of system memory. 2022-11-10 13:53:37.084779: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/home/soheilsh/anaconda3/envs/simulate/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/soheilsh/anaconda3/envs/simulate/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/soheilsh/anaconda3/envs/simulate/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(1356, 30), b.shape=(30, 128), m=1356, n=128, k=30 [[{{node EncodeProcessDecode/graph_independent/node_model/sequential/mlp/linear_0/MatMul}}]] (1) Internal: Blas GEMM launch failed : a.shape=(1356, 30), b.shape=(30, 128), m=1356, n=128, k=30 [[{{node EncodeProcessDecode/graph_independent/node_model/sequential/mlp/linear_0/MatMul}}]] [[truediv_4/_4847]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "anaconda3/envs/simulate/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) ...
GPU : NVIDIA GeForce RTX 3060 Laptop
How can I solve this problem ?
Many thanks in advance for your response :)