Closed SusBioRes-UBC closed 3 years ago
Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.
We would need more information to help you - what version of MXNet, CUDA and cuDNN are you using for example, details about your training script would also be very useful, as well as output of nvidia-smi
while the training seems stuck. How do you know that the network is not proceeding to "the next step" - did you put any prints or other ways of seeing progress there?
Thank you for the prompt response @ptrendx. Sorry I forgot to check the log file and it seems the training was going normally. I guess this seemingly "stuck situation" is probably b/c I did not print anything to the terminal. Sorry for the false alarm. I will close the issue. Regards,
Hello, I'm trying to train AlexNet using ImageNet from scratch. But I’m stuck with the following message for hours and never able to proceed to the next step: _src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the best convolution algorithm, this can take a while… (setting env variable MXNET_CUDNN_AUTOTUNEDEFAULT to 0 to disable)
My system is: 10-Core 3.70 GHz Intel Core i9-10900X GeForce 2080Ti 64 GB DDR4 Ubuntu 18.04
Any suggestion is highly appreciated, thanks!