apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

Stuck with "Mxnet running performance tests to find the best convolution algorithm" #20213

Closed SusBioRes-UBC closed 3 years ago

SusBioRes-UBC commented 3 years ago

Hello, I'm trying to train AlexNet using ImageNet from scratch. But I’m stuck with the following message for hours and never able to proceed to the next step: _src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the best convolution algorithm, this can take a while… (setting env variable MXNET_CUDNN_AUTOTUNEDEFAULT to 0 to disable)

My system is: 10-Core 3.70 GHz Intel Core i9-10900X GeForce 2080Ti 64 GB DDR4 Ubuntu 18.04

Any suggestion is highly appreciated, thanks!

github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

ptrendx commented 3 years ago

We would need more information to help you - what version of MXNet, CUDA and cuDNN are you using for example, details about your training script would also be very useful, as well as output of nvidia-smi while the training seems stuck. How do you know that the network is not proceeding to "the next step" - did you put any prints or other ways of seeing progress there?

SusBioRes-UBC commented 3 years ago

Thank you for the prompt response @ptrendx. Sorry I forgot to check the log file and it seems the training was going normally. I guess this seemingly "stuck situation" is probably b/c I did not print anything to the terminal. Sorry for the false alarm. I will close the issue. Regards,