Open jakirkham opened 4 months ago
Here's another example:
We see some errors in the channel-clone VMs, we are checking if this is related
Thanks Stefan! 🙏
Interested to learn what you discover 🙂
Adding a more recent example in case it is helpful
This one is approaching ~2.5hrs
Idk if there is something more going on with this case (hence mentioning it explicitly)
Edit: Seeing the same issue with this one
Another example after John referred me to this issue: 2+ hours after marking some packages as broken in https://github.com/conda-forge/admin-requests/commit/164a4271b51ee47fdcfdd52905e0137fbdf21003, packages are still not removed from the CDN (it's difficult to do quickfixes to wide-ranging breaks like this one if the turnaround is that long).
>mamba repoquery search -c conda-forge gcc_linux-aarch64=12.3.0=*_2 -p linux-64
Executing the query gcc_linux-aarch64=12.3.0=*_2
conda-forge/noarch 13.9MB @ 3.8MB/s 3.6s
conda-forge/linux-64 33.0MB @ 4.2MB/s 7.8s
Name Version Build Channel Subdir
------------------------------------------------------------------------
gcc_linux-aarch64 12.3.0 h490a0b6_2 (+ 1 builds) conda-forge linux-64 # should not contain h490a0b6_2
>python
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:40:50) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.Timestamp.now("utc")
Timestamp('2024-03-12 05:43:31.911276+0000', tz='UTC') # marked broken >2h before
We see some files generated by anaconda.org's dynamic repodata like linux-aarch64/awscli-1.29.39-py38he37f277_0.conda
that can't be downloaded, and others like noarch/strawberry-graphql-with-asgi-0.162.0-pyhd8ed1ab_0.conda
that cannot be re-indexed as part of the CDN process.
Interesting, thanks Daniel! 🙏
Is there something in regards to those packages specifically that looks relevant?
Or did something unrelated to those packages occur (like a network outage or running out of disk space)?
Were we able to determine the cause here?
We were able to find packages that have repodata but no downloadable archive missing-packages.txt
We were also able to fix a bug that is more likely to be the cause, where we would have trouble re-downloading a package in the CDN process if the first clone failed. We were not able to find the precise cause.
Gotcha thanks for the update Daniel! 🙏
Looked at the first one on the list, linux-aarch64/awscli-1.29.39-py38he37f277_0.conda
. It looks like other OS and arch combinations for 1.29.39
had validation issues ( https://github.com/conda-forge/awscli-feedstock/issues/765#issuecomment-1702350487 ), but linux-aarch64
was not one of these. These eventually this got sorted out. Can see a successful build and upload for this package. Here is the attached log. Though agree downloading this package doesn't work. So not sure what happened here. Maybe it should be marked broken
(unclear whether that helps this issue)
The second package, noarch/boto3-stubs-lite-1.26.89-pyhd8ed1ab_0.conda
, had a validation error with that package specifically ( https://github.com/conda-forge/boto3-stubs-lite-feedstock/issues/120#issuecomment-1465186684 ). Same with the third package, noarch/ca-policy-lcg-1.119-hd8ed1ab_0.conda
, ( https://github.com/conda-forge/ca-policy-lcg-feedstock/issues/20#issue-1627340874 ). At least for these two cases, the conda-forge validation service notes the packages were not copied. However there do appear to be packages with 0
downloads for both boto3-stubs-lite
and ca-policy-lcg
. Visiting or downloading either results in a 404. Perhaps these should be marked broken
as well
Looking at the latter two cases, do not see them in https://conda.anaconda.org/conda-forge/noarch/repodata.json.bz2
. Nor do they show up in conda-forge-repodata-patches
. So am wondering how these are added to the repodata when no package was copied
Idk if we can have aborted copy with the conda-forge validation service that might generate these issues, but that seems like one question that comes out of this
cc @beckermr (in case I'm missing anything here)
We download the dynamic anaconda.org repodata.json before creating the CDN version.
Can you please remind me which URL that lives under?
Just saw a particularly severe case today (still nothing after 3.5h for linux-64
, 2h for win-64
):
The logs show that the CDN clone process downloaded that archive from https://conda-web.anaconda.org/conda-forge/linux-64/clangdev-18.1.2-default_h127d8a8_0.conda, had a bad archive at 2024-03-21T02:29:30 and was able to get a good archive at 2024-03-21T06:12
Not that I pretend to understand the cloning mechanism (or the reasons why it might fail), but would it make sense to have a shorter retry loop for failed clones? Like try again immediately after, or after X minutes delay?
It does retry frequently, there may be an intermediate cache issue.
The CDN appears to be down again.
Approaching the 500min mark 😬 Should that metric be part of https://anaconda.statuspage.io/?
We've addressed a disk-full issue.
Just ran into a a network issue:
conda.CondaMultiError: ('Connection broken: IncompleteRead(199522674 bytes read, 79146888 more expected)', IncompleteRead(199522674 bytes read, 79146888 more expected))
('Connection broken: IncompleteRead(199522674 bytes read, 79146888 more expected)', IncompleteRead(199522674 bytes read, 79146888 more expected))
Wondering if this is related
Approaching the 500min mark 😬 Should that metric be part of anaconda.statuspage.io?
Yes, I think we should start tracking this publicly somehow
CDN is at 37 minutes.
Should be resolved.
Last sync was done almost 10h ago now.
I confirmed it was not updated for 24 hours.
Looking at this case, mirroring has not completed after ~40mins
The linux-aarch64
(highlighted) and win-64
package above it are still mirroring. Approaching the ~1.5hr mark
Regarding cuda-tools
, the linux-aarch64
package is now available on CDN
However the win-64
package is not. It has been ~3.25hrs since it was uploaded
% conda search 'conda-forge:cuda-tools[subdir=win-64]=12.4.1'
Loading channels: done
No match found for: cuda-tools=12.4.1[subdir=win-64]. Search: *cuda-tools*=12.4.1[subdir=win-64]
PackagesNotFoundError: The following packages are not available from current channels:
- cuda-tools=12.4.1[subdir=win-64]
Current channels:
- https://conda.anaconda.org/conda-forge/win-64
- https://conda.anaconda.org/conda-forge/noarch
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
CONDA_DEBUG=1 conda search 'conda-forge::cuda-tools[subdir=win-64]=12.4.1' --json
Thanks Daniel! 🙏
Debug info:
conda info
conda config --show-sources
conda list --show-channel-urls
Command result:
CONDA_DEBUG=1 conda search 'conda-forge::cuda-tools[subdir=win-64]=12.4.1'
Let's try reducing the cache TTL.
CONDA_LOCAL_REPODATA_TTL=0 CONDA_DEBUG=1 conda search 'conda-forge::cuda-tools[subdir=win-64]=12.4.1' --json
That looks promising
CONDA_LOCAL_REPODATA_TTL=0 CONDA_DEBUG=1 conda search 'conda-forge::cuda-tools[subdir=win-64]=12.4.1' --json
This case is approaching the 2hr mark, but does not appear to be picked up by CDN
Can find 1 of the 2 packages expected with CONDA_LOCAL_REPODATA_TTL=0 CONDA_DEBUG=1 conda search 'conda-forge::cuda-toolkit=12.4.1' --json
It looks like both packages are now available:
$ CONDA_LOCAL_REPODATA_TTL=0 CONDA_DEBUG=1 conda search 'conda-forge::cuda-toolkit=12.4.1' --json
DEBUG conda.gateways.logging:set_log_level(233): log_level set to 10
DEBUG conda.core.package_cache_data:_check_writable(321): package cache directory '/home/chl/.miniconda3-x86_64/pkgs' writable: True
DEBUG conda.gateways.repodata:fetch_latest(836): Local cache timed out for https://conda.anaconda.org/conda-forge/linux-64/repodata.json at /home/chl/.miniconda3-x86_64/pkgs/cache/497deca9.json
DEBUG conda.gateways.repodata.jlap.interface:__init__(41): Using ZstdRepoInterface
DEBUG conda.gateways.connection.session:add_binstar_token(247): Adding anaconda token for url <https://conda.anaconda.org/conda-forge/linux-64/repodata.json.zst>
DEBUG urllib3.connectionpool:_new_conn(1052): Starting new HTTPS connection (1): conda.anaconda.org:443
DEBUG urllib3.connectionpool:_make_request(546): https://conda.anaconda.org:443 "GET /t/ch-aad2628c-7f2b-496e-98ce-4fcff0ee47b9/conda-forge/linux-64/repodata.json.zst HTTP/1.1" 304 0
DEBUG conda.gateways.repodata.jlap.fetch:download_and_hash(243): https://conda.anaconda.org/conda-forge/linux-64/repodata.json.zst {'Date': 'Wed, 10 Apr 2024 17:33:49 GMT', 'Connection': 'keep-alive', 'CF-Ray': '87247bf1db4f6c79-DFW', 'CF-Cache-Status': 'HIT', 'Age': '1269', 'Cache-Control': 'public, max-age=1200', 'ETag': '"6134ed6707255ece9635559af12210f1"', 'Expires': 'Wed, 10 Apr 2024 17:53:49 GMT', 'Last-Modified': 'Wed, 10 Apr 2024 17:12:01 GMT', 'Vary': 'Accept-Encoding', 'x-amz-id-2': 'xXMe6GDWbJycN4+8p/BlbLkrY6hf+S8S8Ej7yaJjhjhoMx0RUMMY1OaU07Gb/yxsghhefqFxJqY=', 'x-amz-request-id': '4FRF0VNSKWS1SETG', 'x-amz-version-id': 'null', 'Set-Cookie': '__cf_bm=Zd4sY9TOVRGf_xkAFGWxZt22Jf9zV1CW1ZFappB_tHA-1712770429-1.0.1.1-Whna4UmYB02PdyGWm2hAjfjtN423iI9pahjBjPvhd19Y7Xir_WzKBEVZcHl7rVbWY94oHRFp5TOQeXEby1S1WrcfKFyABRrPsIO9Fo390pE; path=/; expires=Wed, 10-Apr-24 18:03:49 GMT; domain=.anaconda.org; HttpOnly; Secure; SameSite=None', 'Server': 'cloudflare'}
INFO conda.gateways.repodata.jlap.fetch:download_and_hash(264): Download 0 bytes {'User-Agent': 'conda/24.3.0 requests/2.31.0 CPython/3.11.7 Linux/6.5.0-26-generic linuxmint/21.3 glibc/2.35 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8 aau/0.4.4', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Accept': '*/*', 'Connection': 'keep-alive', 'if-none-match': '"6134ed6707255ece9635559af12210f1"'}
DEBUG conda.gateways.repodata.jlap.fetch:timeme(196): Download complete https://conda.anaconda.org/conda-forge/linux-64/repodata.json Took 0.09s
INFO conda.gateways.repodata.jlap.fetch:request_url_jlap_state(431): Apply 0 patches 4e79e31048896620… → 4e79e31048896620…
DEBUG conda.gateways.repodata.jlap.fetch:timeme(196): Apply Patches Took 0.00s
DEBUG conda.gateways.repodata:fetch_latest(836): Local cache timed out for https://conda.anaconda.org/conda-forge/noarch/repodata.json at /home/chl/.miniconda3-x86_64/pkgs/cache/09cdf8bf.json
DEBUG conda.gateways.repodata.jlap.interface:__init__(41): Using ZstdRepoInterface
DEBUG conda.gateways.connection.session:add_binstar_token(247): Adding anaconda token for url <https://conda.anaconda.org/conda-forge/noarch/repodata.json.zst>
DEBUG urllib3.connectionpool:_new_conn(1052): Starting new HTTPS connection (2): conda.anaconda.org:443
DEBUG urllib3.connectionpool:_make_request(546): https://conda.anaconda.org:443 "GET /t/ch-aad2628c-7f2b-496e-98ce-4fcff0ee47b9/conda-forge/noarch/repodata.json.zst HTTP/1.1" 304 0
DEBUG conda.gateways.repodata.jlap.fetch:download_and_hash(243): https://conda.anaconda.org/conda-forge/noarch/repodata.json.zst {'Date': 'Wed, 10 Apr 2024 17:33:52 GMT', 'Connection': 'keep-alive', 'CF-Ray': '87247c051e116b37-DFW', 'CF-Cache-Status': 'HIT', 'Age': '1272', 'Cache-Control': 'public, max-age=1200', 'ETag': '"3605ffb16e98f38bc071ce70257c459c"', 'Expires': 'Wed, 10 Apr 2024 17:53:52 GMT', 'Last-Modified': 'Wed, 10 Apr 2024 17:12:16 GMT', 'Vary': 'Accept-Encoding', 'x-amz-id-2': 'o7PiCRiTKh0i28OBv9US4lJb8Bne90hS+dQWjje44eZ8eGKGXf3KijXyyGIqkITCOeFXcv6WIwU=', 'x-amz-request-id': '4FR2PM0C9P3JSPP4', 'x-amz-version-id': 'null', 'Server': 'cloudflare'}
INFO conda.gateways.repodata.jlap.fetch:download_and_hash(264): Download 0 bytes {'User-Agent': 'conda/24.3.0 requests/2.31.0 CPython/3.11.7 Linux/6.5.0-26-generic linuxmint/21.3 glibc/2.35 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8 aau/0.4.4', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Accept': '*/*', 'Connection': 'keep-alive', 'if-none-match': '"3605ffb16e98f38bc071ce70257c459c"', 'Cookie': '__cf_bm=Zd4sY9TOVRGf_xkAFGWxZt22Jf9zV1CW1ZFappB_tHA-1712770429-1.0.1.1-Whna4UmYB02PdyGWm2hAjfjtN423iI9pahjBjPvhd19Y7Xir_WzKBEVZcHl7rVbWY94oHRFp5TOQeXEby1S1WrcfKFyABRrPsIO9Fo390pE'}
DEBUG conda.gateways.repodata.jlap.fetch:timeme(196): Download complete https://conda.anaconda.org/conda-forge/noarch/repodata.json Took 0.10s
INFO conda.gateways.repodata.jlap.fetch:request_url_jlap_state(431): Apply 0 patches 3a8d1a3f16e02a99… → 3a8d1a3f16e02a99…
DEBUG conda.gateways.repodata.jlap.fetch:timeme(196): Apply Patches Took 0.00s
{
"cuda-toolkit": [
{
"arch": null,
"build": "h7428d3b_0",
"build_number": 0,
"channel": "https://conda.anaconda.org/conda-forge/noarch",
"constrains": [],
"depends": [
"__win",
"cuda-compiler 12.4.1.*",
"cuda-libraries 12.4.1.*",
"cuda-libraries-dev 12.4.1.*",
"cuda-nvml-dev 12.4.127.*",
"cuda-tools 12.4.1.*"
],
"fn": "cuda-toolkit-12.4.1-h7428d3b_0.conda",
"license": "LicenseRef-NVIDIA-End-User-License-Agreement",
"md5": "da683653aaadbeeb5a7a2561b2464728",
"name": "cuda-toolkit",
"noarch": "generic",
"package_type": "noarch_generic",
"platform": null,
"sha256": "3f750755a089f61fb58e6e255706bd2d010fd217fcb0a503c2de9fe3a337b247",
"size": 20519,
"subdir": "noarch",
"timestamp": 1712711684371,
"url": "https://conda.anaconda.org/conda-forge/noarch/cuda-toolkit-12.4.1-h7428d3b_0.conda",
"version": "12.4.1"
},
{
"arch": null,
"build": "ha804496_0",
"build_number": 0,
"channel": "https://conda.anaconda.org/conda-forge/noarch",
"constrains": [],
"depends": [
"__linux",
"cuda-compiler 12.4.1.*",
"cuda-libraries 12.4.1.*",
"cuda-libraries-dev 12.4.1.*",
"cuda-nvml-dev 12.4.127.*",
"cuda-tools 12.4.1.*"
],
"fn": "cuda-toolkit-12.4.1-ha804496_0.conda",
"license": "LicenseRef-NVIDIA-End-User-License-Agreement",
"md5": "e1e8cfdbb172f4b6558ce2db688e851f",
"name": "cuda-toolkit",
"noarch": "generic",
"package_type": "noarch_generic",
"platform": null,
"sha256": "c3faecbb52cbdb82d1723e82ed572ae23bc187863b7f499f8afe7247cf1178c1",
"size": 20097,
"subdir": "noarch",
"timestamp": 1712717191443,
"url": "https://conda.anaconda.org/conda-forge/noarch/cuda-toolkit-12.4.1-ha804496_0.conda",
"version": "12.4.1"
}
]
}
This latest delay is most likely due to the issues with the anaconda.org backend (xref: conda/infrastructure#899). The channel cloning process relies on various calls to .org's API; the .org database was sporadically triggering the OOM (out-of-memory) killer on the backend host. Anaconda's infrastructure team has expanded memory and scaling allocation for the database backend, and that should help stabilize things again.
Could we write a process that periodically checks repodata-clone.json versus repodata.json to keep track of any delay between packages appearing in the former versus the latter
Thanks Cheng and Daniel! 🙏
Daniel, think that is a good idea. If there is some way to share log details or maybe graphs on resource usage, that might help as well
We were also wondering if it would make sense to have a GH template for CDN issues ( https://github.com/conda/infrastructure/issues/912 ). Are there specific pieces of info we should be capturing that would help narrow things down?
Seeing issues today with https://anaconda.org/conda-forge/hyperion-fortran/files, not available after 2h
Am seeing this with recent llvmlite
RC packages. These were upload ~2hrs ago. However it looks like none are available yet
To know whether recent packages can be pulled, one can watch the last update of repodata.json
with:
watch -n 1 "curl -sI https://conda.anaconda.org/conda-forge/linux-64/repodata.json | grep 'last-modified'"
It looks like it is taking over 1hr for some packages to mirror. For example:
Do we know what might be causing this?