conda / infrastructure

A repo to report issues and have discussions about the conda infrastructure
BSD 3-Clause "New" or "Revised" License
11 stars 15 forks source link

Seeing timeouts with https://conda.anaconda.org/rapidsai-nightly #906

Open jakirkham opened 5 months ago

jakirkham commented 5 months ago

Checklist

What happened?

#11 142.8     Traceback (most recent call last):
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/conda/exception_handler.py", line 17, in __call__
#11 142.8         return func(*args, **kwargs)
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 959, in exception_converter
#11 142.8         raise e
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 952, in exception_converter
#11 142.8         exit_code = _wrapped_main(*args, **kwargs)
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 898, in _wrapped_main
#11 142.8         result = do_call(parsed_args, p)
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 763, in do_call
#11 142.8         exit_code = install(args, parser, "install")
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 558, in install
#11 142.8         transaction.fetch_extract_packages()
#11 142.8     RuntimeError: Download error (28) Timeout was reached [https://conda.anaconda.org/rapidsai-nightly/linux-64/librmm-24.04.00a39-cuda12_240402_g0651edf0_39.tar.bz2]
#11 142.8     Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds

Conda Info

active environment : None
       user config file : /home/rapids/.condarc
 populated config files : /opt/conda/.condarc
          conda version : 24.1.2
    conda-build version : not installed
         python version : 3.9.19.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=sapphirerapids
                          __conda=24.1.2=0
                          __glibc=2.35=0
                          __linux=6.5.0=0
                          __unix=0=0
       base environment : /opt/conda  (writable)
      conda av data dir : /opt/conda/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/rapidsai-nightly/linux-64
                          https://conda.anaconda.org/rapidsai-nightly/noarch
                          https://conda.anaconda.org/dask/label/dev/linux-64
                          https://conda.anaconda.org/dask/label/dev/noarch
                          https://conda.anaconda.org/pytorch/linux-64
                          https://conda.anaconda.org/pytorch/noarch
                          https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/nvidia/linux-64
                          https://conda.anaconda.org/nvidia/noarch
          package cache : /opt/conda/pkgs
                          /home/rapids/.conda/pkgs
       envs directories : /opt/conda/envs
                          /home/rapids/.conda/envs
               platform : linux-64
             user-agent : conda/24.1.2 requests/2.31.0 CPython/3.9.19 Linux/6.5.0-1016-aws ubuntu/22.04.4 glibc/2.35 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8
                UID:GID : 1001:1000
             netrc file : None
           offline mode : False

Conda Config

==> /opt/conda/.condarc <==
auto_update_conda: False
channels:
  - rapidsai-nightly
  - dask/label/dev
  - pytorch
  - conda-forge
  - nvidia

Conda list

# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
archspec                  0.2.3              pyhd8ed1ab_0    conda-forge
boltons                   24.0.0             pyhd8ed1ab_0    conda-forge
brotli-python             1.1.0            py39h3d6467e_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0           py39h7a31438_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
conda                     24.1.2           py39hf3d152e_0    conda-forge
conda-libmamba-solver     24.1.0             pyhd8ed1ab_0    conda-forge
conda-package-handling    2.2.0              pyh38be061_0    conda-forge
conda-package-streaming   0.9.0              pyhd8ed1ab_0    conda-forge
cryptography              42.0.5           py39hd4f0224_0    conda-forge
distro                    1.9.0              pyhd8ed1ab_0    conda-forge
fmt                       10.2.1               h00ab1b0_0    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
jsonpatch                 1.33               pyhd8ed1ab_0    conda-forge
jsonpointer               2.4              py39hf3d152e_3    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libarchive                3.7.2                h2aa1ff5_1    conda-forge
libcurl                   8.7.1                hca28451_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_5    conda-forge
libgomp                   13.2.0               h807b86a_5    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libmamba                  1.5.8                had39da4_0    conda-forge
libmambapy                1.5.8            py39h10defb6_0    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libsolv                   0.7.28               hfc55251_2    conda-forge
libsqlite                 3.45.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_5    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.6               h232c23b_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mamba                     1.5.8            py39hc5d2bb1_0    conda-forge
menuinst                  2.0.2            py39hf3d152e_0    conda-forge
ncurses                   6.4.20240210         h59595ed_0    conda-forge
openssl                   3.2.1                hd590300_1    conda-forge
packaging                 24.0               pyhd8ed1ab_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
platformdirs              4.2.0              pyhd8ed1ab_0    conda-forge
pluggy                    1.4.0              pyhd8ed1ab_0    conda-forge
pybind11-abi              4                    hd8ed1ab_3    conda-forge
pycosat                   0.6.6            py39hd1e30aa_0    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pyopenssl                 24.0.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.19          h0755675_0_cpython    conda-forge
python_abi                3.9                      4_cp39    conda-forge
readline                  8.2                  h8228510_1    conda-forge
reproc                    14.2.4.post0         hd590300_1    conda-forge
reproc-cpp                14.2.4.post0         h59595ed_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
ruamel.yaml               0.18.6           py39hd1e30aa_0    conda-forge
ruamel.yaml.clib          0.2.8            py39hd1e30aa_0    conda-forge
setuptools                69.2.0             pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toolz                     0.12.1             pyhd8ed1ab_0    conda-forge
tqdm                      4.66.2             pyhd8ed1ab_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml-cpp                  0.8.0                h59595ed_0    conda-forge
zstandard                 0.22.0           py39h6e5214e_0    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

Additional Context

Here is the full log

jakirkham commented 5 months ago

cc @raydouglass @mmccarty (for vis)

jakirkham commented 5 months ago

@dholth could you please help us look at this?

Looks like this issue started last week: https://github.com/conda/infrastructure/issues/899#issuecomment-2026232053 (right before a company break)

raydouglass commented 5 months ago

Just trying to rule out possible issues; we encounter network issues when switching from mamba to conda invocations as well.

https://github.com/rapidsai/docker/actions/runs/8528278564/job/23361587494?pr=650#step:11:236

#11 60.18 CondaHTTPError: HTTP 520 CONNECTION FAILED for url <https://conda.anaconda.org/dask/label/dev/noarch/repodata.json>

https://github.com/rapidsai/docker/actions/runs/8528278564/job/23361586014?pr=650#step:9:1193

#12 156.8 Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='conda.anaconda.org', port=443): Read timed out. (read timeout=60.0)")': /rapidsai-nightly/linux-64/dask-cudf-24.04.00a586-cuda11_py311_240402_g35f818b3e4_586.tar.bz2
#12 343.0 CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/rapidsai-nightly/linux-64/dask-cudf-24.04.00a586-cuda11_py311_240402_g35f818b3e4_586.tar.bz2>
raydouglass commented 5 months ago

Also I cannot say with 100% confidence, but so far all of the errors I've seen reviewing logs from the past week involved the rapidsai-nightly or dask/label/dev channels, but could be a coincidence since we download a lot of large packages from rapidsai-nightly and so the odds of a network error is higher.

I have not checked every single error though.

jakirkham commented 5 months ago

cc @mariusvniekerk (for awareness)

jakirkham commented 5 months ago

Also saw this with rapidsai in this CI job

error    libmamba ZSTD decompression error: Unknown frame descriptor
Download error (23) Failed writing received data to disk/application [https://conda.anaconda.org/rapidsai/noarch/repodata.json.zst]
Failure writing output to destination, passed 689 returned 690
jakirkham commented 5 months ago

Have also see this with cf-staging. Snippet below from this GHA job:

E           binstar_client.errors.ServerError: ('?: Undefined error ([GET] https://api.anaconda.org/dist/cf-staging/blah-2696ff/2024.04.03.03.11.49/noarch%2Fblah-2696ff-2024.04.03.03.11.49-py_0.tar.bz2 -> 524)', 524)
jezdez commented 5 months ago

Just noting that we're investigating this still

morremeyer commented 5 months ago

Hey everyone, quick info from the infrastructure side of Anaconda: We're on this and have found an issue in the underlying infrastructure that is likely to cause this. We're going to implement a fix for this suspected cause in the next few hours.

I'll let you know when this has been rolled out.

morremeyer commented 5 months ago

We have rolled out the configuration changes that should fix these timeouts. Please let us know if you continue to see these issues.

raydouglass commented 5 months ago

@jezdez & @morremeyer Thanks for the update!

We were consistently encountering the networking errors during one of our workflows over the past week+.

I triggered a rerun and it was able to successfully download all of the conda packages, so seems like things are improved!

traversaro commented 4 months ago

We have rolled out the configuration changes that should fix these timeouts. Please let us know if you continue to see these issues.

I am continuing seeing these kind of problems in the robotology or robostack-staging channel (posting here as I guess the problem is similar, if you prefer that I open a different issue just let me know). Example CI failure:

Restarting the CI typically solve this issues.

jakirkham commented 3 months ago

Are others still seeing this? If so, a fresh reproducer would help

If not, would suggest we close to clean up the issue tracker and focus efforts on current issues

traversaro commented 3 months ago

Based on https://github.com/robotology/icub-models/actions/workflows/cxx-ci.yml it sees to me that it is still happening, at a rough rate of 1 in ~20 builds, but much less frequently that happened in early april, when the failure rate was 1 in ~3/4 builds.

Ok for me in closing if keeping the issue open is not useful.

jakirkham commented 3 months ago

Current examples seem worthy of discussion

No strong feelings as to whether that stays in this issue or is moved to a new one