Using mxnet on RTX3090 #19520

Open DestinyMy opened 3 years ago

DestinyMy commented 3 years ago


I have some problems on using mxnet on RTX 3090. 30 series GPU only support cuda11, but I can't find the version of cuda11 corresponding to mxnet. I've browsed a lot of blogs or documents, but I still haven't found a solution.

Thanks for your advices.

github-actions[bot] commented 3 years ago

Wallart commented 3 years ago

Hello, In order to use MXNet on my RTX 3090, I had to build MXNet 1.8.0.rc1 with CUDA 11 support. If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1), so you could use MXNet without rebuilding everything.

ptrendx commented 3 years ago

Alternatively you can use the NGC container: https://ngc.nvidia.com/catalog/containers/nvidia:mxnet , version 20.10 supports sm_86 (so RTX3000 series).

DestinyMy commented 3 years ago

Thank you very much. I'll try it right away.

dai-ichiro commented 3 years ago

Check this site. https://dist.mxnet.io/python/cu110

If your OS is Linux, you can install the nightly version of mxnet.

pip install mxnet-cu110==1.9.0b20201116 -f https://dist.mxnet.io/python/cu110

Hope this helps.

shilei-nj commented 3 years ago

@Wallart I do the same thing, it worked but the training speed on 3090 is slower than 2080ti. Do you have the same problem?

Wallart commented 3 years ago

@shilei-nj With equivalent batch size, the training speed on 3090 is equivalent to my old 1080ti. What type of model are you training ? Can you give some context about MXNet version / build options ?

I can try to reproduce, as I'm mostly using the extra amount of VRAM I might have missed performances issues

shilei-nj commented 3 years ago

@Wallart I have solved the problem, with MXNet 1.8.0.rc2 and cuda 11.1. You should update cudnn from 8.0.4 to 8.0.5, this is very important. And modify KNOWN_CUDA_ARCHS in MXNet Makefile, add 86. Now it's really fast.

Light-- commented 3 years ago

version 20.10 supports sm_86 (so RTX3000 series).

NOT worked. I tested all the containers.

szha commented 3 years ago

@shilei-nj thanks for pointing it out. Would you help add this change to the v1.x branch?

@Light-- could you file a bug report for the issue you are facing? We will need more details to identify the issue which are requested in the issue template. Thanks!

shilei-nj commented 3 years ago

@szha no thanks, just fix it by your next update please.

chinakook commented 3 years ago

MXNet2.0 built by myself is working fine with RTX3090.

Light-- commented 3 years ago

Wallart commented 3 years ago

I'm providing SSH daemon for remote debugging purposes. You need to run the image in background with -itd options. Then you can execute docker exec -it -u USER mxnet-1.8.0.rc1 bash or connect the container to your preferred IDE. You can also populate your container with a specific user/uid in order to mount volumes with -e HOST_USER=myUser -e HOST_UID=$(id -u)

Wallart commented 3 years ago

@Wallart I have solved the problem, with MXNet 1.8.0.rc2 and cuda 11.1. You should update cudnn from 8.0.4 to 8.0.5, this is very important. And modify KNOWN_CUDA_ARCHS in MXNet Makefile, add 86. Now it's really fast.

How fast it's going compared to your old build options ? I will give it a try

shilei-nj commented 3 years ago

@Wallart With old options it is a little bit slower than 2080ti, now it's 50% faster than 2080ti.

Light-- commented 3 years ago

hey, @Wallart what will you say about this?

i followed your steps, but

$ sudo docker exec -it -u root 37239180ff52 bash
root@37239180ff52:/tmp# python
Python 3.7.7 (default, Jun 26 2020, 05:10:03)
[GCC 7.3.0] :: Intel(R) Corporation on linux
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import mxnet
Illegal instruction (core dumped)
Light-- commented 3 years ago

$ pip install mxnet_cu110-1.9.0b20201116-py2.py3-none-manylinux2014_x86_64.whl
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: mxnet-cu110==1.9.0b20201116 from file:///home/user1/mxcuda11/mxnet_cu110-1.9.0b20201116-py2.py3-none-manylinux2014_x86_64.whl in /home/user1/.local/lib/python3.6/site-packages (1.9.0b20201116)
Requirement already satisfied: requests<3,>=2.20.0 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (2.25.0)
Requirement already satisfied: graphviz<0.9.0,>=0.8.1 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (0.8.4)
Requirement already satisfied: numpy<2.0.0,>1.16.0 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (1.19.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (2.6)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (1.21.1)
(mxgpu) user1@pc228:~/mxcuda11$ python
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet

Illegal instruction (core dumped)

my environment:

Ubuntu 20.04.1 LTS
Linux pc 5.4.0-53-generic #59-Ubuntu SMP Wed Oct 21 09:38:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce RTX 3090    On   | 00000000:02:00.0 Off |                  N/A |
| 30%   30C    P8    28W / 350W |      1MiB / 24265MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

$ pip list
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Package           Version
----------------- --------------
asn1crypto        0.22.0
certifi           2020.6.20
cffi              1.10.0
chardet           3.0.4
cryptography      1.8.1
dataclasses       0.7
future            0.18.2
graphviz          0.8.4
idna              2.6
mxnet-cu110       1.9.0b20201116
numpy             1.19.4
packaging         16.8
Pillow            8.0.1
pip               20.2.2
pycparser         2.18
pyOpenSSL         17.0.0
pyparsing         2.2.0
PySocks           1.6.6
requests          2.25.0
setuptools        36.4.0
six               1.10.0
torch             1.7.0+cu110
torchvision       0.8.1+cu110
urllib3           1.21.1
wheel             0.29.0
chinakook commented 3 years ago

I think it's time to get cuda 11.1 and sm_86 into official mxnet support list as RTX 3090 series is very popular.

Wallart commented 3 years ago

@Light-- What type of CPU are you using ?

Light-- commented 3 years ago

What type of CPU are you using ?

@Wallart Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz

TristonC commented 3 years ago

CUDA 11.1 and sm_86 are supported in MXNet 1.8+. @DestinyMy Has your problem be solved?

TNTran92 commented 2 years ago

@TristonC , Do you have Windows version of MXNet 1.8+ I only saw linux on pypi https://pypi.org/project/mxnet-cu112/