Open artyom-beilis opened 3 years ago
Anybody here?
@artyom-beilis Thanks for your patch! I have tried it the same as this https://github.com/BVLC/caffe/issues/6970. But encountered with a large memory utilization in case of cudnn8. After some tests I have tried a model with single conv layer and ( 20 3 1280 * 720 ) input, it's "head" of ResNet used for detection task. With cuda10 and cudnn7.6 I observed about 1.7Gb usage for a forward pass, for cuda 11 and cudnn8 ~ 2.6Gb. Maybe this comparison is not fully correct, because different GPUs were used, Titan XP in the first case and 3060 for the second. Have you seen something like this with 3070 and 1080? Thank you!
Hi, I noticed larger memory use as well. It looks like related to cudnn8 in general. I see clear difference when I build same code with cudnn7 vs cudnn8. Also make sure you use latest alignment fix, i.e. latest branch: https://github.com/artyom-beilis/caffe/commits/fixes_for_cudnn8_bvlc_master
Also caffe in general is memory hug. AFAIR I noticed the difference in memory use of cudnn7 vs cudnn8 with other frameworks as well. Artyom
@artyom-beilis Thanks for your patch! I have tried it the same as this #6970. But encountered with a large memory utilization in case of cudnn8. After some tests I have tried a model with single conv layer and ( 20 3 1280 * 720 ) input, it's "head" of ResNet used for detection task. With cuda10 and cudnn7.6 I observed about 1.7Gb usage for a forward pass, for cuda 11 and cudnn8 ~ 2.6Gb. Maybe this comparison is not fully correct, because different GPUs were used, Titan XP in the first case and 3060 for the second. Have you seen something like this with 3070 and 1080? Thank you!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
AFAIR I noticed the difference in memory use of cudnn7 vs cudnn8 with other frameworks as well.
Could you tell more about other frameworks? I have tried to find similar GPU memory problems mentions but unsuccessful.
AFAIR I noticed the difference in memory use of cudnn7 vs cudnn8 with other frameworks as well.
Could you tell more about other frameworks? I have tried to find similar GPU memory usage mentions but unsuccessful.
I don't really remember. It was either pytorch or mxnet. I don't recall. Was long time ago.
Anyway, thank you! )
Following this as current caffe I built with nvcr.io/nvidia/cuda:11.4.1-cudnn8-devel-ubuntu20.04
with OpenPose results to a much larger GPU RAM footprint on an AWS G5(Ampere).
I tried the proposed changes to make cuDNN8 work but it does not work and the training immediately ends with the following error:
I0120 10:25:10.763470 1539595 solver.cpp:60] Solver scaffolding done.
I0120 10:25:10.765404 1539595 caffe.cpp:239] Starting Optimization
I0120 10:25:10.765410 1539595 solver.cpp:292] Solving squeezenet-ssd
I0120 10:25:10.765413 1539595 solver.cpp:293] Learning Rate Policy: poly
F0120 10:25:10.835502 1539595 cudnn_conv_layer.cu:118] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
@ 0x7f09cdf8f1c3 google::LogMessage::Fail()
@ 0x7f09cdf9425b google::LogMessage::SendToLog()
@ 0x7f09cdf8eebf google::LogMessage::Flush()
@ 0x7f09cdf8f6ef google::LogMessageFatal::~LogMessageFatal()
@ 0x7f09ce7753f0 caffe::CuDNNConvolutionLayer<>::Backward_gpu()
@ 0x7f09ce711c6a caffe::Net<>::BackwardFromTo()
@ 0x7f09ce711da5 caffe::Net<>::Backward()
@ 0x7f09ce6ecaab caffe::Solver<>::Step()
@ 0x7f09ce6ed492 caffe::Solver<>::Solve()
@ 0x55739e9b4a7a train()
@ 0x55739e9b1eac main
@ 0x7f09cd2fb083 __libc_start_main
@ 0x55739e9b290e _start
Ubuntu 20.04 nVidia GeForce RTX 3060 12 GB Driver Version: 510.108.03 CUDA Version: 11.6 cuDNN Version: 8.6
Build without cuDNN runs without problems.
Support of CuDNN8
Some of the API that was used by Caffe was removed in cudnn8. Without it it is impossible to run Caffe on Ampre architecture.
It required:
The change was tested on