ContinuumIO / anaconda-issues

Anaconda issue tracking
646 stars 220 forks source link

Very slow NVIDIA repo downloads #13092

Open divideconcept opened 1 year ago

divideconcept commented 1 year ago

Checklist

Impacted product

What happened?

When installing PyTorch with the recommended conda install instructions to get GPU/CUDA support: conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

All cuda packages coming from the NVIDIA conda repo are extremely slow to download, down to a couple kbps, making it almost impossible to install the required hundred megabytes packages.

I've run the same test from 5 different internet connections (mainly in France), on 5 different OS (3 Windows, 2 Linux) and got the same result: all packages from the PyTorch repo download fast (250mbps) while packages coming from the NVIDIA repo do not download faster than 400kbps or so.

Expected behavior or outcome

NVIDIA repo packages downloading at several mbps instead of a couple kbps.

Conda info

active environment : base
    active env location : C:\Users\divide\miniconda3
            shell level : 1
       user config file : C:\Users\divide\.condarc
 populated config files : C:\Users\divide\.condarc
          conda version : 4.12.0
    conda-build version : not installed
         python version : 3.9.12.final.0
       virtual packages : __cuda=11.6=0
                          __win=0=0
                          __archspec=1=x86_64
       base environment : C:\Users\divide\miniconda3  (writable)
      conda av data dir : C:\Users\divide\miniconda3\etc\conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\Users\divide\miniconda3\pkgs
                          C:\Users\divide\.conda\pkgs
                          C:\Users\divide\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\divide\miniconda3\envs
                          C:\Users\divide\.conda\envs
                          C:\Users\divide\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/4.12.0 requests/2.27.1 CPython/3.9.12 Windows/10 Windows/10.0.22000
          administrator : False
             netrc file : None
           offline mode : False

Conda config

==> C:\Users\divide\.condarc <==
auto_activate_base: False
ssl_verify: False
channel_priority: disabled

Conda list

# packages in environment at C:\Users\divide\miniconda3:
#
# Name                    Version                   Build  Channel
brotlipy                  0.7.0           py39h2bbff1b_1003    defaults
ca-certificates           2022.3.29            haa95532_1    defaults
certifi                   2021.10.8        py39haa95532_2    defaults
cffi                      1.15.0           py39h2bbff1b_1    defaults
charset-normalizer        2.0.4              pyhd3eb1b0_0    defaults
colorama                  0.4.4              pyhd3eb1b0_0    defaults
conda                     4.12.0           py39haa95532_0    defaults
conda-content-trust       0.1.1              pyhd3eb1b0_0    defaults
conda-package-handling    1.8.1            py39h8cc25b3_0    defaults
console_shortcut          0.1.1                         4    defaults
cryptography              36.0.0           py39h21b164f_0    defaults
idna                      3.3                pyhd3eb1b0_0    defaults
menuinst                  1.4.18           py39h59b6b97_0    defaults
openssl                   1.1.1n               h2bbff1b_0    defaults
pip                       21.2.4           py39haa95532_0    defaults
powershell_shortcut       0.0.1                         3    defaults
pycosat                   0.6.3            py39h2bbff1b_0    defaults
pycparser                 2.21               pyhd3eb1b0_0    defaults
pyopenssl                 22.0.0             pyhd3eb1b0_0    defaults
pysocks                   1.7.1            py39haa95532_0    defaults
python                    3.9.12               h6244533_0    defaults
pywin32                   302              py39h2bbff1b_2    defaults
requests                  2.27.1             pyhd3eb1b0_0    defaults
ruamel_yaml               0.15.100         py39h2bbff1b_0    defaults
setuptools                61.2.0           py39haa95532_0    defaults
six                       1.16.0             pyhd3eb1b0_1    defaults
sqlite                    3.38.2               h2bbff1b_0    defaults
tqdm                      4.63.0             pyhd3eb1b0_0    defaults
tzdata                    2022a                hda174b7_0    defaults
urllib3                   1.26.8             pyhd3eb1b0_0    defaults
vc                        14.2                 h21ff451_1    defaults
vs2015_runtime            14.27.29016          h5e58377_2    defaults
wheel                     0.37.1             pyhd3eb1b0_0    defaults
win_inet_pton             1.1.0            py39haa95532_0    defaults
wincertstore              0.2              py39haa95532_2    defaults
yaml                      0.2.5                he774522_0    defaults

Additional information

No response

barabo commented 1 year ago

See: https://github.com/pytorch/pytorch/issues/88659#issuecomment-1308855453

divideconcept commented 1 year ago

It seemed to work this morning, but it's stuck again now. Same issue. image

barabo commented 1 year ago

@divideconcept - Just checking - is this issue resolved?

barabo commented 1 year ago

Please re-open if this is ongoing. From our side it looks like it should be working as intended.

divideconcept commented 1 year ago

@barabo tried this morning and this evening, both successful. So all good !

gorkamunoz commented 1 year ago

Hi, I am having the exact same problem when doing: conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit

Find below the current progress (it's been ~1h since it started downloading). Also, some progress bars seem broken, for instance the libcublas one.

I understand the solution was found for the pytorchproblem, but in my case is directly related to the cuda-toolkit. Any insight on this? Thanks!

image

hidoba commented 1 year ago

Hi, I am having the exact same problem when doing: conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit

Find below the current progress (it's been ~1h since it started downloading). Also, some progress bars seem broken, for instance the libcublas one.

I understand the solution was found for the pytorchproblem, but in my case is directly related to the cuda-toolkit. Any insight on this? Thanks!

image

As a temporary solution you can use VPN as I've also mentioned here.

barabo commented 1 year ago

TL;DR - I wonder if this has something to do with EU transit from AWS.

Here's what I see when I use wget on OSX for libcublas-dev-11.10.1.25-0.tar.bz2:

download link

(base) canderson@carls-mbp-2 test % time wget https://anaconda.org/nvidia/libcublas-dev/11.10.1.25/download/win-64/libcublas-dev-11.10.1.25-0.tar.bz2
--2022-11-20 19:41:16--  https://anaconda.org/nvidia/libcublas-dev/11.10.1.25/download/win-64/libcublas-dev-11.10.1.25-0.tar.bz2
Resolving anaconda.org (anaconda.org)... 104.17.92.24, 104.17.93.24
Connecting to anaconda.org (anaconda.org)|104.17.92.24|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://binstar-cio-packages-prod.s3.amazonaws.com/6137debe5884df75bfa7428a/62f29bf519a6f124a4cdec61?response-content-disposition=attachment%3B%20filename%3D%22libcublas-dev-11.10.1.25-0.tar.bz2%22%3B%20filename%2A%3DUTF-8%27%27libcublas-dev-11.10.1.25-0.tar.bz2&response-content-type=application%2Fx-tar&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Expires=600&X-Amz-Date=20221121T014106Z&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEHkaCXVzLWVhc3QtMSJGMEQCIDyIpKeN3T2m7YXRM7zEsy8MTNk7X6%2F%2BM6FvKuLOdzWEAiA4m0b5RPlrk0vPoRifQSZGB4YV%2F1Q2lb40dVSu5TQcWirVBAiB%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAAaDDQ1NTg2NDA5ODM3OCIM3mmJn4ohy4OLcIoVKqkEs5pqku6tYDMQ55sqL69wYU6PMegUdbO47Qqr0dopGPF8HPjCixSjEY3PZIw4zhSQTVNSeULVLk%2Bx%2Fb0eXz0OBRjD47c8wFdZmn5gVNvfyTPV1Plnnq44rJDWvNMAUysFC08SAmkL%2FJxNQ8DFP7lmBXwwI7T6IsxuGb6TrPyRfpnJMsBmvamVRxms6IZGfKxuNkZC8cY98uxiqwz4pP9Ps04lPKvW0uxG3w4PyPpJAd1b24sxcrjnEBT8Y%2FVnD0PX0wR%2B5JBOU5b3Q5MAwtlUCZkYrcBNO6szn8rYvqsPBBWCejWw%2B6tTIrmwpLqSgymoex8JjCapjpeZX4xnbsKuBZxdDERJL9hZodvTDB42h98DZeA8C%2FoRagOkfS7dCMqI5hjppKPXpDawWCpqOSNtrUuaFibQuVrNLBblySHb3i3FcdO3IoCVLqCeNLM%2BLBs3xOiCo9VbU0MgZGv7c5eiW6IECmNjfJOCIXtPW3RgXb7Rz01pZMarsXlhyzRjci77TgweI%2BofP4nlUtGVWpeGm1%2FtyZ90S%2FQUVkZIEghdRZVhPCDHaYN9FRGVlYUxIidmdMf%2FFPeiZSVuCwp4UQjkKA0RVftlNbiUtTpHpC7pg7sf9C2w0C0XhyUv5xqAkCsmiZyzRfjr0gDAJXipKPhDuGysqVDX1hOW8iaMeQKapiq73pDM%2F5U2akSTTQmgQtqCd3OoexOBxXZmkIl0XbaqkAC%2BwRolMCWhqjD2huubBjqqARvFrq6pgsDCJcYfhjMVl0ChpoDdXGP9WFCbP0G7uEn5Co7puO4aRwAtdn51N3hJh4n0GRc84iya8dlVglE6Xp5i8Kn3YKvTqdPhRXSTD9nBCnLJct4aPlMZ1eGorEwzKVjTOrbwvF3O0oRCZ4%2FH8YSFSxq36jkPhv35kLxUEiaHOgQgjZpRqJYnPCyy%2B48pFfsIbCxCRaTjw04yKsjbrdmfNtM7NnozWzzy&X-Amz-Credential=ASIAWUI46DZFBYQ5KTDK%2F20221121%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=fa8afff1cb570ea4fb3b6874f05c8d82168121d70491f9966b27da8f3354b391 [following]
--2022-11-20 19:41:17--  https://binstar-cio-packages-prod.s3.amazonaws.com/6137debe5884df75bfa7428a/62f29bf519a6f124a4cdec61?response-content-disposition=attachment%3B%20filename%3D%22libcublas-dev-11.10.1.25-0.tar.bz2%22%3B%20filename%2A%3DUTF-8%27%27libcublas-dev-11.10.1.25-0.tar.bz2&response-content-type=application%2Fx-tar&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Expires=600&X-Amz-Date=20221121T014106Z&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEHkaCXVzLWVhc3QtMSJGMEQCIDyIpKeN3T2m7YXRM7zEsy8MTNk7X6%2F%2BM6FvKuLOdzWEAiA4m0b5RPlrk0vPoRifQSZGB4YV%2F1Q2lb40dVSu5TQcWirVBAiB%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAAaDDQ1NTg2NDA5ODM3OCIM3mmJn4ohy4OLcIoVKqkEs5pqku6tYDMQ55sqL69wYU6PMegUdbO47Qqr0dopGPF8HPjCixSjEY3PZIw4zhSQTVNSeULVLk%2Bx%2Fb0eXz0OBRjD47c8wFdZmn5gVNvfyTPV1Plnnq44rJDWvNMAUysFC08SAmkL%2FJxNQ8DFP7lmBXwwI7T6IsxuGb6TrPyRfpnJMsBmvamVRxms6IZGfKxuNkZC8cY98uxiqwz4pP9Ps04lPKvW0uxG3w4PyPpJAd1b24sxcrjnEBT8Y%2FVnD0PX0wR%2B5JBOU5b3Q5MAwtlUCZkYrcBNO6szn8rYvqsPBBWCejWw%2B6tTIrmwpLqSgymoex8JjCapjpeZX4xnbsKuBZxdDERJL9hZodvTDB42h98DZeA8C%2FoRagOkfS7dCMqI5hjppKPXpDawWCpqOSNtrUuaFibQuVrNLBblySHb3i3FcdO3IoCVLqCeNLM%2BLBs3xOiCo9VbU0MgZGv7c5eiW6IECmNjfJOCIXtPW3RgXb7Rz01pZMarsXlhyzRjci77TgweI%2BofP4nlUtGVWpeGm1%2FtyZ90S%2FQUVkZIEghdRZVhPCDHaYN9FRGVlYUxIidmdMf%2FFPeiZSVuCwp4UQjkKA0RVftlNbiUtTpHpC7pg7sf9C2w0C0XhyUv5xqAkCsmiZyzRfjr0gDAJXipKPhDuGysqVDX1hOW8iaMeQKapiq73pDM%2F5U2akSTTQmgQtqCd3OoexOBxXZmkIl0XbaqkAC%2BwRolMCWhqjD2huubBjqqARvFrq6pgsDCJcYfhjMVl0ChpoDdXGP9WFCbP0G7uEn5Co7puO4aRwAtdn51N3hJh4n0GRc84iya8dlVglE6Xp5i8Kn3YKvTqdPhRXSTD9nBCnLJct4aPlMZ1eGorEwzKVjTOrbwvF3O0oRCZ4%2FH8YSFSxq36jkPhv35kLxUEiaHOgQgjZpRqJYnPCyy%2B48pFfsIbCxCRaTjw04yKsjbrdmfNtM7NnozWzzy&X-Amz-Credential=ASIAWUI46DZFBYQ5KTDK%2F20221121%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=fa8afff1cb570ea4fb3b6874f05c8d82168121d70491f9966b27da8f3354b391
Resolving binstar-cio-packages-prod.s3.amazonaws.com (binstar-cio-packages-prod.s3.amazonaws.com)... 54.231.195.209, 52.217.167.145, 52.216.21.75, ...
Connecting to binstar-cio-packages-prod.s3.amazonaws.com (binstar-cio-packages-prod.s3.amazonaws.com)|54.231.195.209|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 300252986 (286M) [application/x-tar]
Saving to: ‘libcublas-dev-11.10.1.25-0.tar.bz2’

libcublas-dev-11.10.1.25-0.tar.b 100%[========================================================>] 286.34M  18.8MB/s    in 15s

2022-11-20 19:41:33 (19.0 MB/s) - ‘libcublas-dev-11.10.1.25-0.tar.bz2’ saved [300252986/300252986]

wget   0.29s user 1.42s system 10% cpu 16.364 total

Do you get the same slow performance when you download this file using the link above? Do you have a tool you can use to test this download from the command line? I believe Windows 10 and up comes with curl now.

Try curl -L -o test.tar.z2 https://anaconda.org/nvidia/libcublas-dev/11.10.1.25/download/win-64/libcublas-dev-11.10.1.25-0.tar.bz2

(base) canderson@carls-mbp-2 test % curl -L --output foo.tar.bz2 --url https://anaconda.org/nvidia/libcublas-dev/11.10.1.25/download/win-64/libcublas-dev-11.10.1.25-0.tar.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3951    0  3951    0     0  11851      0 --:--:-- --:--:-- --:--:-- 12194
100  286M  100  286M    0     0  21.1M      0  0:00:13  0:00:13 --:--:-- 22.4M

You can see in the wget output above that the transfer is coming from binstar-cio-packages-prod.s3.amazonaws.com. This is our s3 bucket that backs anaconda.org. Downloads that come from here are not coming through our CDN. I wonder if it has something to do with the use of a label in the channel specifier.

I wonder if these will both perform the same for you:

conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit
conda install -c "nvidia" cuda-toolkit

If they don't, that would point to the logic in the CDN that translates the URL to the destination. It may not handle labeled channels correctly.

barabo commented 1 year ago

@divideconcept @gorkamunoz @hidoba - if any/all of you could run these two commands and let me know if they run at the same or different speeds, that would help.

conda install -c "nvidia"                   cuda-toolkit
conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit
noxthot commented 1 year ago

I had the same problem the last couple of days. Today (~1-2 hours ago) I retried and everything was back to normal for

conda install -c "nvidia"                   cuda-toolkit

In the beginning

conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit

was incredibly slow, so I tried to get more information using curl; speed was quite normal there however. When I retried the conda install on cuda-11.7.0 speed was back to normal there too. So I guess the timing for my curl-test was bad.

Rephrasing what @barabo already described: I guess in case someone else is experiencing this problem with a specific label/version, then it would be nice to get the info about speed in curl for that specific label. conda prompts which version it is trying to download. The links to the files can be found here (you can choose from different labels on top of the table): https://anaconda.org/nvidia/libcublas-dev/files?channel=cuda-11.7.0

(I am located in Austria)

gorkamunoz commented 1 year ago

Hi @barabo , I tried both your commands, it is still slow today. I am also in Austria, so it may be related to the issue you comment above. I used mamba to get more information on the installation, here is a pic where you can see the download speed when using conda install -c "nvidia" cuda-toolkit:

image

When using conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit it's again the same... I get ~ 10kb/s downloads for libcublas-dev. To compare, a check on my download speed in chrome is ~80Mb/s.

In my case I worked around these problems by installing everything with pip.

GevatterWuff commented 1 year ago

Also Austria, same problem, completely stuck at libcufft-dev-10.9.0. (144.6MB) @ 0%. No way to get GB files.

noxthot commented 1 year ago

I tried again at home now.

Download via cuda and mamba is extremly slow.

  + cuda-cccl                    11.8.89  0      nvidia/linux-64        1MB
[..]
  + cuda-cuxxfilt                11.8.86  0      nvidia/linux-64      298kB
[..]
  + cuda-driver-dev              11.8.89  0      nvidia/linux-64       16kB
[..]
  + cuda-profiler-api            11.8.86  0      nvidia/linux-64       19kB
[..]
  + libcufile                   1.4.0.31  0      nvidia/linux-64      561kB
[..]
cuda-driver-dev                                     16.3kB @   2.5kB/s  6.5s
cuda-cuxxfilt                                      298.3kB @  37.9kB/s  7.9s
cuda-profiler-api                                   18.9kB @   2.4kB/s  8.0s
libcufile                                          561.4kB @  17.0kB/s 26.5s
cuda-cccl                                            1.2MB @  16.5kB/s 1m:14.6s

curl (I tested this right before and after the mamba call above) is fine:

curl -L -o /tmp/test.tar.z2 https://anaconda.org/nvidia/libcublas-dev/11.11.3.6/download/linux-64/libcublas-dev-11.11.3.6-0.tar.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4019    0  4019    0     0   1886      0 --:--:--  0:00:02 --:--:--  1886
  9  394M    9 36.5M    0     0  2829k      0  0:02:22  0:00:13  0:02:09 3654k

Same file as in mamba output (cuda-cccl):

curl -L -o /tmp/test.tar.z2 https://anaconda.org/nvidia/cuda-cccl/11.8.89/download/linux-64/cuda-cccl-11.8.89-0.tar.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3995    0  3995    0     0   1395      0 --:--:--  0:00:02 --:--:--  1394
100 1203k  100 1203k    0     0   270k      0  0:00:04  0:00:04 --:--:--  999k

A speed test showed that my download speed is currently ~35Mbps.

barabo commented 1 year ago

I have good and bad news.

The good news is that conda installs that use the channel nvidia should be faster for our EMEA users now! There was indeed an issue with the CDN not resolving conda.anaconda.org/nvidia to our CDN clone, so that was fixed.

The bad news is that channels with labels will not resolve to the CDN for any channel. For example, nvidia/label/cuda-11.7.0 will still be slow for some users.

I've opened an issue internally to track the changes we would need to make to enable labeled channels to be fed from the CDN - but this change may take some time and effort.

8-chems commented 1 year ago

Hi there, do you have any suggestions on how to download the packages directly and install them offline? I've been trying to run the command conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia, but I haven't been successful due to connectivity issues. Any guidance on how to proceed would be greatly appreciated!