Closed ShuangLiu1992 closed 7 years ago
This is just an artifact of how cuDNN allocates memory and picks algorithms to run. You could try calling set_dnn_prefer_smallest_algorithms() which tells cuDNN to use less memory. That might make it behave in a less confusing way.
hmmmm, that's odd, thank you! I will try set_dnn_prefer_smallest_algorithms()
Hello Davis, I am getting the same message on dnn_semantic_segmentation_train_ex, tried to downsample the crop size from 227x227 to 101x101 but now an error on the calculation of loss for the gradient descent gives an error. Tried setting the set_dnn_prefer_smallest_algorithms(); with no success. What unexplored options are left?
Make batch sizes smaller or reduce the size of the network. There are a lot of options.
Ok, it is working now :) but the process is taking too long. The batch size was downgraded from 30 to 4 being that number the greatest empirically found to be feasible.
Do you have documentation that you have made that can give me a hint about the required time?
How long does dnn_semantic_segmentation_train_ex take normally to train on Pascal VOC?
How long would you say it would take on a Quadro M500M?
How do I know if the network is converging?
Thank you Davis
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\bin\win64\Debug\deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro M500M"
CUDA Driver Version / Runtime Version 9.1 / 9.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2048 MBytes (2147483648 bytes)
( 3) Multiprocessors, (128) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 1124 MHz (1.12 GHz)
Memory Clock rate: 900 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 6 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
These things can take several days to train on the fastest GPUs. I don't know how fast your GPU is going to be, probably a lot slower.
The solver does automatic convergence checking so don't worry about it. It's explained here: http://blog.dlib.net/2018/02/automatic-learning-rate-scheduling-that.html
I am renting a P5000 in Parsec (paperspace) and now is running under the original mini-batch size of 30 and now the average loss is consistently falling :D .
Do you find convenient to tune up the momentum or learning rate?
I usually leave those at their defaults. But you can try changing them to see what happens.
Hello Davis, I'm testing the new dnn face detector on my images and I noticed for some batch size it reports:
Error while calling cudaMalloc(&backward_data_workspace, backward_data_workspace_size_in_bytes) in file dlib/dnn/cudnn_dlibapi.cpp:908. code: 2, reason: out of memory
However it goes away if I set the batch size to a even higher number and the batch size to reproduce such error seems to be random.Please find attached my code to reproduce such error with ubuntu 16, cuda 8.0, gcc 5.4, opencv 3.0 + 640 * 360 images, batch size 4 leads to out of memory and batch size 16 doesn't. imgs is a
std::vector<cv::Mat>
rgb version of the test images.