apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

Using mxnet on RTX3090 #19520

Open DestinyMy opened 3 years ago

DestinyMy commented 3 years ago

Hello,

I have some problems on using mxnet on RTX 3090. 30 series GPU only support cuda11, but I can't find the version of cuda11 corresponding to mxnet. I've browsed a lot of blogs or documents, but I still haven't found a solution.

Thanks for your advices.

github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

Wallart commented 3 years ago

Hello, In order to use MXNet on my RTX 3090, I had to build MXNet 1.8.0.rc1 with CUDA 11 support. If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1), so you could use MXNet without rebuilding everything.

ptrendx commented 3 years ago

Alternatively you can use the NGC container: https://ngc.nvidia.com/catalog/containers/nvidia:mxnet , version 20.10 supports sm_86 (so RTX3000 series).

DestinyMy commented 3 years ago

Thank you very much. I'll try it right away.

dai-ichiro commented 3 years ago

Check this site. https://dist.mxnet.io/python/cu110

If your OS is Linux, you can install the nightly version of mxnet.

pip install mxnet-cu110==1.9.0b20201116 -f https://dist.mxnet.io/python/cu110

Hope this helps.

shilei-nj commented 3 years ago

@Wallart I do the same thing, it worked but the training speed on 3090 is slower than 2080ti. Do you have the same problem?

Wallart commented 3 years ago

@shilei-nj With equivalent batch size, the training speed on 3090 is equivalent to my old 1080ti. What type of model are you training ? Can you give some context about MXNet version / build options ?

I can try to reproduce, as I'm mostly using the extra amount of VRAM I might have missed performances issues

shilei-nj commented 3 years ago

@Wallart I have solved the problem, with MXNet 1.8.0.rc2 and cuda 11.1. You should update cudnn from 8.0.4 to 8.0.5, this is very important. And modify KNOWN_CUDA_ARCHS in MXNet Makefile, add 86. Now it's really fast.

Light-- commented 3 years ago

version 20.10 supports sm_86 (so RTX3000 series).

NOT worked. I tested all the containers.

szha commented 3 years ago

@shilei-nj thanks for pointing it out. Would you help add this change to the v1.x branch?

@Light-- could you file a bug report for the issue you are facing? We will need more details to identify the issue which are requested in the issue template. Thanks!

shilei-nj commented 3 years ago

@szha no thanks, just fix it by your next update please.

chinakook commented 3 years ago

MXNet2.0 built by myself is working fine with RTX3090.

Light-- commented 3 years ago

If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1),

@Wallart buddy, your docker image runs like this.....

$ sudo docker run --gpus all -ti a86ad560010f /bin/bash
Starting as 9001:deeplearning
deeplearning home directory ready
deeplearning home directory populated
Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.

what's this?????????????????? how to use?? my other docker images works fine

Wallart commented 3 years ago

If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1),

@Wallart buddy, your docker image runs like this.....

$ sudo docker run --gpus all -ti a86ad560010f /bin/bash
Starting as 9001:deeplearning
deeplearning home directory ready
deeplearning home directory populated
Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.

what's this?????????????????? how to use?? my other docker images works fine

I'm providing SSH daemon for remote debugging purposes. You need to run the image in background with -itd options. Then you can execute docker exec -it -u USER mxnet-1.8.0.rc1 bash or connect the container to your preferred IDE. You can also populate your container with a specific user/uid in order to mount volumes with -e HOST_USER=myUser -e HOST_UID=$(id -u)

Wallart commented 3 years ago

@Wallart I have solved the problem, with MXNet 1.8.0.rc2 and cuda 11.1. You should update cudnn from 8.0.4 to 8.0.5, this is very important. And modify KNOWN_CUDA_ARCHS in MXNet Makefile, add 86. Now it's really fast.

How fast it's going compared to your old build options ? I will give it a try

shilei-nj commented 3 years ago

@Wallart With old options it is a little bit slower than 2080ti, now it's 50% faster than 2080ti.

Light-- commented 3 years ago

Then you can execute docker exec -it -u USER mxnet-1.8.0.rc1 bash or connect the container to your preferred IDE.

hey, @Wallart what will you say about this?

i followed your steps, but

$ sudo docker exec -it -u root 37239180ff52 bash
root@37239180ff52:/tmp# python
Python 3.7.7 (default, Jun 26 2020, 05:10:03)
[GCC 7.3.0] :: Intel(R) Corporation on linux
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import mxnet
Illegal instruction (core dumped)
Light-- commented 3 years ago

pip install mxnet-cu110==1.9.0b20201116 -f https://dist.mxnet.io/python/cu110

hey, @dai-ichiro, what will you say about this:

$ pip install mxnet_cu110-1.9.0b20201116-py2.py3-none-manylinux2014_x86_64.whl
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: mxnet-cu110==1.9.0b20201116 from file:///home/user1/mxcuda11/mxnet_cu110-1.9.0b20201116-py2.py3-none-manylinux2014_x86_64.whl in /home/user1/.local/lib/python3.6/site-packages (1.9.0b20201116)
Requirement already satisfied: requests<3,>=2.20.0 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (2.25.0)
Requirement already satisfied: graphviz<0.9.0,>=0.8.1 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (0.8.4)
Requirement already satisfied: numpy<2.0.0,>1.16.0 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (1.19.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (2.6)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (1.21.1)
(mxgpu) user1@pc228:~/mxcuda11$ python
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet

Illegal instruction (core dumped)

my environment:

Ubuntu 20.04.1 LTS
Linux pc 5.4.0-53-generic #59-Ubuntu SMP Wed Oct 21 09:38:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    On   | 00000000:02:00.0 Off |                  N/A |
| 30%   30C    P8    28W / 350W |      1MiB / 24265MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

$ pip list
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Package           Version
----------------- --------------
asn1crypto        0.22.0
certifi           2020.6.20
cffi              1.10.0
chardet           3.0.4
cryptography      1.8.1
dataclasses       0.7
future            0.18.2
graphviz          0.8.4
idna              2.6
mxnet-cu110       1.9.0b20201116
numpy             1.19.4
packaging         16.8
Pillow            8.0.1
pip               20.2.2
pycparser         2.18
pyOpenSSL         17.0.0
pyparsing         2.2.0
PySocks           1.6.6
requests          2.25.0
setuptools        36.4.0
six               1.10.0
torch             1.7.0+cu110
torchvision       0.8.1+cu110
typing-extensions 3.7.4.3
urllib3           1.21.1
wheel             0.29.0
chinakook commented 3 years ago

I think it's time to get cuda 11.1 and sm_86 into official mxnet support list as RTX 3090 series is very popular.

Wallart commented 3 years ago

@Light-- What type of CPU are you using ?

Light-- commented 3 years ago

What type of CPU are you using ?

@Wallart Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz

TristonC commented 3 years ago

CUDA 11.1 and sm_86 are supported in MXNet 1.8+. @DestinyMy Has your problem be solved?

TNTran92 commented 2 years ago

@TristonC , Do you have Windows version of MXNet 1.8+ I only saw linux on pypi https://pypi.org/project/mxnet-cu112/