CEA-LIST / N2D2

N2D2 is an open source CAD framework for Deep Neural Network simulation and full DNN-based applications building.
Other
146 stars 35 forks source link

Can not learn with CUDA #29

Closed vankhoa21991 closed 5 years ago

vankhoa21991 commented 5 years ago

Hello, when I do the cmake, I had to remove everything in the exec folder except the n2d2.cpp because if not it will lead to this error: "add_executable cannot create target "n2d2" because another target with the same name already exists."

Then I did the make like usual. But when I test the model I ran on this error with CUDA, and if I set to Frame only, the model does not learn. Do you know what is the problem that I made? Thanks

sudo ./build/bin/n2d2 models/mnist24_16c4s2_24c5s2_150_10.ini -learn 40000000 -log 100000 Option -log: number of steps between logs [100000] Option -learn: number of backprop learning steps [40000000] Loading network configuration file models/mnist24_16c4s2_24c5s2_150_10.ini Layer: conv1 [Conv(Frame_CUDA)] Notice: Could not open configuration file: conv1.cfg

Shared synapses: 256

Virtual synapses: 30976

Inputs dims: 24 24 1

Outputs dims: 11 11 16

Warning: No monitor could be added to Cell: conv1 Layer: conv2 [Conv(Frame_CUDA)] Notice: Could not open configuration file: conv2.cfg

Shared synapses: 2250

Virtual synapses: 36000

Inputs dims: 11 11 16

Outputs dims: 4 4 24

Warning: No monitor could be added to Cell: conv2 Layer: fc1 [Fc(Frame_CUDA)] Notice: Could not open configuration file: fc1.cfg

Synapses: 57600

Inputs dims: 4 4 24

Outputs dims: 1 1 150

Warning: No monitor could be added to Cell: fc1 Layer: fc1.drop [Dropout(Frame_CUDA)] Notice: Could not open configuration file: fc1.drop.cfg

Inputs dims: 1 1 150

Outputs dims: 1 1 150

Warning: No monitor could be added to Cell: fc1.drop Layer: fc2 [Fc(Frame_CUDA)] Notice: Could not open configuration file: fc2.cfg

Synapses: 1500

Inputs dims: 1 1 150

Outputs dims: 1 1 10

Warning: No monitor could be added to Cell: fc2 Layer: softmax [Softmax(Frame_CUDA)] Notice: Could not open configuration file: softmax.cfg

Inputs dims: 1 1 10

Outputs dims: 1 1 10

Target: softmax (target value: 1 / default value: 0 / top-n value: 1) Warning: No monitor could be added to Cell: softmax Total number of neurons: 2640 Total number of nodes: 2640 Total number of synapses: 61606 Total number of virtual synapses: 126076 Total number of connections: 126076 Notice: Unused section softmax.Target in INI file CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED (1) in /home/kevin/IMRA_le/3_Program/SNN/N2D2/include/CudaContext.hpp:58 Time elapsed: 1.79893 s Error: CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED (1) in /home/kevin/IMRA_le/3_Program/SNN/N2D2/include/CudaContext.hpp:58

vankhoa21991 commented 5 years ago

When i do nvidia-smi, i got this +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.116 Driver Version: 390.116 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX TIT... Off | 00000000:03:00.0 On | N/A | | 22% 51C P0 75W / 250W | 526MiB / 12211MiB | 1% Default | +-------------------------------+----------------------+----------------------+

olivierbichler-cea commented 5 years ago

Hello, Do you have CuDNN properly installed? What is your CuDNN version?

vankhoa21991 commented 5 years ago

This is the result from cmake sudo cmake -DCMAKE_C_COMPILER=gcc-6 -DCMAKE_CXX_COMPILER=g++-6 .. -- cotire 1.8.0 loaded. -- No PugiXML found -- MongoDB not found. -- CuDNN library status: -- version: 7.4.1 -- include path: /usr/local/cuda/include -- libraries: /usr/local/cuda/lib64/libcudnn.so -- Configuring done -- Generating done

olivierbichler-cea commented 5 years ago

It looks like your driver version is not compatible with your CuDNN version, according to the CuDNN support matrix: https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html

vankhoa21991 commented 5 years ago

Thank you, now I'm having cudnn 7.4.1, CUDA 9.2, driver 390.116. Should I downgrade the driver to 384.11 or downgrade the CUDA to 9.0? It looks like my driver is not in this table.

olivierbichler-cea commented 5 years ago

According to the table, you should upgrade your driver to r396.26. I recommend to upgrade it if you can, instead of downgrading other things.

olivierbichler-cea commented 5 years ago

The learning in Frame only should work we the latest version of N2D2. There was a bug that has been corrected since.

olivierbichler-cea commented 5 years ago

Closing the issue, as this is a driver problem. Please feel free to re-open it if necessary.