Open DestinyMy opened 3 years ago
Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.
Hello, In order to use MXNet on my RTX 3090, I had to build MXNet 1.8.0.rc1 with CUDA 11 support. If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1), so you could use MXNet without rebuilding everything.
Alternatively you can use the NGC container: https://ngc.nvidia.com/catalog/containers/nvidia:mxnet , version 20.10 supports sm_86
(so RTX3000 series).
Thank you very much. I'll try it right away.
Check this site. https://dist.mxnet.io/python/cu110
If your OS is Linux, you can install the nightly version of mxnet.
pip install mxnet-cu110==1.9.0b20201116 -f https://dist.mxnet.io/python/cu110
Hope this helps.
@Wallart I do the same thing, it worked but the training speed on 3090 is slower than 2080ti. Do you have the same problem?
@shilei-nj With equivalent batch size, the training speed on 3090 is equivalent to my old 1080ti. What type of model are you training ? Can you give some context about MXNet version / build options ?
I can try to reproduce, as I'm mostly using the extra amount of VRAM I might have missed performances issues
@Wallart I have solved the problem, with MXNet 1.8.0.rc2 and cuda 11.1. You should update cudnn from 8.0.4 to 8.0.5, this is very important. And modify KNOWN_CUDA_ARCHS in MXNet Makefile, add 86. Now it's really fast.
version 20.10 supports
sm_86
(so RTX3000 series).
NOT worked. I tested all the containers.
@shilei-nj thanks for pointing it out. Would you help add this change to the v1.x branch?
@Light-- could you file a bug report for the issue you are facing? We will need more details to identify the issue which are requested in the issue template. Thanks!
@szha no thanks, just fix it by your next update please.
MXNet2.0 built by myself is working fine with RTX3090.
If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1),
@Wallart buddy, your docker image runs like this.....
$ sudo docker run --gpus all -ti a86ad560010f /bin/bash
Starting as 9001:deeplearning
deeplearning home directory ready
deeplearning home directory populated
Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.
what's this?????????????????? how to use?? my other docker images works fine
If you are familiar with Docker, I've created an image (docker pull wallart/dl_mxnet:1.8.0.rc1),
@Wallart buddy, your docker image runs like this.....
$ sudo docker run --gpus all -ti a86ad560010f /bin/bash Starting as 9001:deeplearning deeplearning home directory ready deeplearning home directory populated Server listening on 0.0.0.0 port 22. Server listening on :: port 22.
what's this?????????????????? how to use?? my other docker images works fine
I'm providing SSH daemon for remote debugging purposes. You need to run the image in background with -itd options.
Then you can execute docker exec -it -u USER mxnet-1.8.0.rc1 bash
or connect the container to your preferred IDE.
You can also populate your container with a specific user/uid in order to mount volumes with -e HOST_USER=myUser -e HOST_UID=$(id -u)
@Wallart I have solved the problem, with MXNet 1.8.0.rc2 and cuda 11.1. You should update cudnn from 8.0.4 to 8.0.5, this is very important. And modify KNOWN_CUDA_ARCHS in MXNet Makefile, add 86. Now it's really fast.
How fast it's going compared to your old build options ? I will give it a try
@Wallart With old options it is a little bit slower than 2080ti, now it's 50% faster than 2080ti.
Then you can execute
docker exec -it -u USER mxnet-1.8.0.rc1 bash
or connect the container to your preferred IDE.
hey, @Wallart what will you say about this?
i followed your steps, but
$ sudo docker exec -it -u root 37239180ff52 bash
root@37239180ff52:/tmp# python
Python 3.7.7 (default, Jun 26 2020, 05:10:03)
[GCC 7.3.0] :: Intel(R) Corporation on linux
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import mxnet
Illegal instruction (core dumped)
pip install mxnet-cu110==1.9.0b20201116 -f https://dist.mxnet.io/python/cu110
hey, @dai-ichiro, what will you say about this:
$ pip install mxnet_cu110-1.9.0b20201116-py2.py3-none-manylinux2014_x86_64.whl
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: mxnet-cu110==1.9.0b20201116 from file:///home/user1/mxcuda11/mxnet_cu110-1.9.0b20201116-py2.py3-none-manylinux2014_x86_64.whl in /home/user1/.local/lib/python3.6/site-packages (1.9.0b20201116)
Requirement already satisfied: requests<3,>=2.20.0 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (2.25.0)
Requirement already satisfied: graphviz<0.9.0,>=0.8.1 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (0.8.4)
Requirement already satisfied: numpy<2.0.0,>1.16.0 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from mxnet-cu110==1.9.0b20201116) (1.19.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (2.6)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/user1/anaconda3/envs/mxgpu/lib/python3.6/site-packages (from requests<3,>=2.20.0->mxnet-cu110==1.9.0b20201116) (1.21.1)
(mxgpu) user1@pc228:~/mxcuda11$ python
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet
Illegal instruction (core dumped)
my environment:
Ubuntu 20.04.1 LTS
Linux pc 5.4.0-53-generic #59-Ubuntu SMP Wed Oct 21 09:38:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 On | 00000000:02:00.0 Off | N/A |
| 30% 30C P8 28W / 350W | 1MiB / 24265MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
$ pip list
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Package Version
----------------- --------------
asn1crypto 0.22.0
certifi 2020.6.20
cffi 1.10.0
chardet 3.0.4
cryptography 1.8.1
dataclasses 0.7
future 0.18.2
graphviz 0.8.4
idna 2.6
mxnet-cu110 1.9.0b20201116
numpy 1.19.4
packaging 16.8
Pillow 8.0.1
pip 20.2.2
pycparser 2.18
pyOpenSSL 17.0.0
pyparsing 2.2.0
PySocks 1.6.6
requests 2.25.0
setuptools 36.4.0
six 1.10.0
torch 1.7.0+cu110
torchvision 0.8.1+cu110
typing-extensions 3.7.4.3
urllib3 1.21.1
wheel 0.29.0
I think it's time to get cuda 11.1 and sm_86 into official mxnet support list as RTX 3090 series is very popular.
@Light-- What type of CPU are you using ?
What type of CPU are you using ?
@Wallart Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
CUDA 11.1 and sm_86 are supported in MXNet 1.8+. @DestinyMy Has your problem be solved?
@TristonC , Do you have Windows version of MXNet 1.8+ I only saw linux on pypi https://pypi.org/project/mxnet-cu112/
Hello,
I have some problems on using mxnet on RTX 3090. 30 series GPU only support cuda11, but I can't find the version of cuda11 corresponding to mxnet. I've browsed a lot of blogs or documents, but I still haven't found a solution.
Thanks for your advices.