conda / infrastructure

A repo to report issues and have discussions about the conda infrastructure
BSD 3-Clause "New" or "Revised" License
12 stars 15 forks source link

Conda-Forge CDN Outage #851

Closed jakirkham closed 10 months ago

jakirkham commented 11 months ago

We are seeing an extended CDN outage in conda-forge. Please see screenshot below

image

jezdez commented 11 months ago

Thanks @jakirkham! We've bubbled it up in Anaconda.

h-vetinari commented 11 months ago

Seems we're coming back to normal... Thanks a lot for the quick help!

barabo commented 11 months ago

There was a full-disk on the clone worker - @dholth and I will be making some changes later today that should eliminate this issue going forward. We'll update this issue after that's complete.

jakirkham commented 11 months ago

Thanks Carl & Jannis! 🙏

barabo commented 11 months ago

We freed up 1.2 TB of cached *.tar.bz2 package files from the local clone VM disk. This will not affect their availability in the channel! We also added a clean-up step to delete package files from the local disk that are confirmed to be available in the CDN. So, disk space should not be a problem for this stage of the CDN pipeline again.

jakirkham commented 11 months ago

Thanks Carl! 🙏

That must have been a lot of *.tar.bz files. Were there *.conda files that needed cleaning up as well?

dholth commented 11 months ago

We will be able to clean up all indexed and mirrored packages, but we have been cautious by removing mostly-older .tar.bz2

jakirkham commented 11 months ago

Ok am wondering about .conda packages since that would likely cause a back up next (as conda-forge is mainly producing .conda packages atm)

dholth commented 11 months ago

This is the beginning of related conda-index work.. This may or may not be the right way to go, but by separating the cache and packages directories we could index packages straight from fsspec or s3 or some other remote storage. Similarly the current CDN clone overrides https://github.com/conda/conda-index/blob/main/conda_index/index/sqlitecache.py#L425 so that the set of files-to-be-indexed does not hit the local filesystem.

jezdez commented 10 months ago

I'm closing this particular infra issue since it seems to have been resolved for now.