apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

A100 gluon_net.load_params() blocking #20758

Open rylynchen opened 2 years ago

rylynchen commented 2 years ago

Description

On NVIDIA V100 code work well. On NVIDIA A100 code is blocking in gluon_net.load_params, without any error. And I'm sure , params file exist, gpu is available.

Anyone have same problem?

github-actions[bot] commented 2 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

HaibaraEs commented 2 years ago

What are the operating system and CUDA version you are using? I also encountered the exact same problem, because CUDA10.x does not support RTX30 series graphics cards, and MXNet does not support CUDA11.x on Windows.

rylynchen commented 2 years ago

OS: centos7 HOST CUDA: 11.4 docker CUDA: 10