conda / infrastructure

A repo to report issues and have discussions about the conda infrastructure
BSD 3-Clause "New" or "Revised" License
11 stars 15 forks source link

General slowness observed on anaconda.org (web, API, file serving) #899

Closed mbargull closed 3 months ago

mbargull commented 5 months ago

Checklist

What happened?

This is a very vague issue, but I generally noticed very slow response times for anaconda.org as of late

  1. on the web interface,
  2. via API (this one I can pin down to happening at least from 2024-03-22 03:00+00 onward),
  3. for non CDN-backed channels (i.e., not conda-forge/bioconda), e.g., conda-forge/label/... channels.

For the last part, one of latest evidences would be:

Download error (28) Timeout was reached [https://conda.anaconda.org/conda-forge/label/lief_rc/noarch/repodata.json.zst]
Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds

observed at https://dev.azure.com/conda-forge/84710dde-1620-425b-80d0-4cf5baca359d/_build/results?buildId=904323&view=logs&j=a70f640f-cc53-5cd3-6cdc-236a1aa90802

Conda Info

No response

Conda Config

No response

Conda list

No response

Additional Context

No response

mbargull commented 5 months ago

I just got a plain

upstream connect error or disconnect/reset before headers. reset reason: connection termination

as a response to opening https://anaconda.org/main/repo in the web browser. After that, it took a minute or so to load for cases where it was successful. I also got a 520 once

Web server is returning an unknown error Error code 520
Visit [cloudflare.com](https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_520&utm_campaign=anaconda.org) for more information.
2024-03-27 15:40:14 UTC
chenghlee commented 5 months ago

Can you try this for some channel not conda-forge, bioconda, or Anaconda? There's some Cloudflare wizardry that happens for larger channels that splits out requests that are bound for CDN (e.g., main label) and requests that are not (e.g., other labels).

mbargull commented 5 months ago

Do you happen to know any non-CDN-backed channel that is somewhat large? To me, it seems like some server load issues are happening since very small requests/channels seem to work fine.

chenghlee commented 5 months ago

Can you try https://anaconda.org/LiteX-Hub/?

mbargull commented 5 months ago

Got a

upstream connect error or disconnect/reset before headers. reset reason: connection termination

right away for https://anaconda.org/LiteX-Hub/ .

mbargull commented 5 months ago

Downloading https://anaconda.org/LiteX-Hub/ via curl which took about 31 seconds but worked. And after that it is cached on the CDN for me an as such seems to be served reliably and faster (can still take up to 10 seconds, though).

mbargull commented 5 months ago

Just happened incidentally:

# conda-smithy rerender
Traceback (most recent call last):
  File "/home/mbargull/code/conda/conda-forge/envs/conda-smithy/bin/conda-smithy", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/mbargull/code/conda/conda-forge/envs/conda-smithy/lib/python3.11/site-packages/conda_smithy/cli.py", line 737, in main
    args.subcommand_func(args)
  File "/home/mbargull/code/conda/conda-forge/envs/conda-smithy/lib/python3.11/site-packages/conda_smithy/cli.py", line 584, in __call__
    self._call(args, tmpdir)
  File "/home/mbargull/code/conda/conda-forge/envs/conda-smithy/lib/python3.11/site-packages/conda_smithy/cli.py", line 589, in _call
    configure_feedstock.main(
  File "/home/mbargull/code/conda/conda-forge/envs/conda-smithy/lib/python3.11/site-packages/conda_smithy/configure_feedstock.py", line 2602, in main
    check_version_uptodate("conda-smithy", __version__, True)
  File "/home/mbargull/code/conda/conda-forge/envs/conda-smithy/lib/python3.11/site-packages/conda_smithy/configure_feedstock.py", line 2310, in check_version_uptodate
    most_recent_version = get_most_recent_version(name).version
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mbargull/code/conda/conda-forge/envs/conda-smithy/lib/python3.11/site-packages/conda_smithy/configure_feedstock.py", line 2299, in get_most_recent_version
    request.raise_for_status()
  File "/home/mbargull/code/conda/conda-forge/envs/conda-smithy/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 524 Server Error:  for url: https://api.anaconda.org/package/conda-forge/conda-smithy
chenghlee commented 5 months ago

Anaconda's infrastructure team has noticed a scaling issue with the .org back-end; that might be related, and they're currently working on resolving that issue. I'll update this ticket once they're done deploying the changes, and we can recheck to see if that fixes the issue.

mbargull commented 5 months ago

Thanks keeping an eye out and thanks to the infra team for working on it! :hammer_and_wrench:

rymndhng commented 5 months ago

Can you try this for some channel not conda-forge, bioconda, or Anaconda? There's some Cloudflare wizardry that happens for larger channels that splits out requests that are bound for CDN (e.g., main label) and requests that are not (e.g., other labels).

As a datapoint, one of my local tests on the comet-ml channel had this issue as well:

❯  curl -I -X GET https://conda.anaconda.org/comet_ml/noarch/repodata.json
HTTP/2 503 
date: Wed, 27 Mar 2024 16:51:42 GMT
content-type: text/html; charset=utf-8
cf-ray: 86b0e172de228417-YVR
cf-cache-status: DYNAMIC
cache-control: no-cache, max-age=0
content-disposition: inline; filename=db_connection_failure.html
expires: Wed, 27 Mar 2024 16:51:42 GMT
last-modified: Tue, 27 Feb 2024 15:49:08 GMT
x-envoy-upstream-service-time: 63125
set-cookie: __cf_bm=XrShwBedMNXr6JIgiG4V4_a.E8uNRzG76qurvQ8rqA0-1711558302-1.0.1.1-byOHt3W7jRO7wPo0lClotR.iFh8F5YVPHUG6kqWbN4Fm09VSFkW34lffnqjQ.aSJL7_BfpEr.RbKM_0UKXc8ueGpuWEqdok1gifLj8pYQWY; path=/; expires=Wed, 27-Mar-24 17:21:42 GMT; domain=.anaconda.org; HttpOnly; Secure; SameSite=None
server: cloudflare
mariusvniekerk commented 5 months ago

This still seems to be happening sporadically with the rapidsai channel at the very least when using mamba

info     libmamba Transfer done for 'rapidsai/linux-64'
info     libmamba Transfer finalized, status: 200 [https://conda.anaconda.org/rapidsai/linux-64/repodata.json] 263365 bytes
info     libmamba Transfer done for 'conda-forge/noarch'
info     libmamba Transfer finalized, status: 200 [https://conda.anaconda.org/conda-forge/noarch/repodata.json] 16311864 bytes
info     libmamba Transfer done for 'conda-forge/linux-64'
info     libmamba Transfer finalized, status: 200 [https://conda.anaconda.org/conda-forge/linux-64/repodata.json] 39168964 bytes
info     libmamba Download error (28) Timeout was reached [https://conda.anaconda.org/rapidsai/noarch/repodata.json]
    Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds
Download error (28) Timeout was reached [https://conda.anaconda.org/rapidsai/noarch/repodata.json]
Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/opt/mambaforge/lib/python3.10/site-packages/conda/exceptions.py", line 1132, in __call__
        return func(*args, **kwargs)
      File "/opt/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 941, in exception_converter
        raise e
      File "/opt/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 934, in exception_converter
        exit_code = _wrapped_main(*args, **kwargs)
      File "/opt/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 892, in _wrapped_main
        result = do_call(parsed_args, p)
      File "/opt/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 758, in do_call
        exit_code = create(args, parser)
      File "/opt/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 632, in create
        return install(args, parser, "create")
      File "/opt/mambaforge/lib/python3.10/site-packages/mamba/mamba.py", line 499, in install
        index = load_channels(pool, channels, repos)
      File "/opt/mambaforge/lib/python3.10/site-packages/mamba/utils.py", line 129, in load_channels
        index = get_index(
      File "/opt/mambaforge/lib/python3.10/site-packages/mamba/utils.py", line 110, in get_index
        is_downloaded = dlist.download(api.MAMBA_DOWNLOAD_FAILFAST)
    RuntimeError: Download error (28) Timeout was reached [https://conda.anaconda.org/rapidsai/noarch/repodata.json]
    Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds
jakirkham commented 3 months ago

At least in the RAPIDS case, we managed to fix issues in ( https://github.com/conda/infrastructure/issues/906 )

mbargull commented 3 months ago

In the lasts weeks I've only noticed some sporadic failures, again, with (Cloudflare-specific) 520 or 524 HTTP status codes. But not as prevalent as in March/April. Let's close this for now. Thanks for working on stabilizing and generally keeping things going!