conda / infrastructure

A repo to report issues and have discussions about the conda infrastructure
BSD 3-Clause "New" or "Revised" License
12 stars 15 forks source link

CDN sync seems to be slower than usual this week #740

Closed leofang closed 10 months ago

leofang commented 1 year ago

Checklist

What happened?

CDN sync seems to be slower than usual this week. Taking libcublas as example

截圖 2023-04-05 下午8 42 23

I started monitoring the status via conda search --platform linux-aarch64 libcublas after this PR is merged and the copy to the conda-forge channel is done, and as shown above it took ~47 mins for conda search to find it. IIRC the CDN sync time has been significantly reduced to 15-30 mins before, so this is a bit concerning.

Conda Info

$ conda info

     active environment : opt_einsum_dev
    active env location : /home/leof/miniforge3/envs/opt_einsum_dev
            shell level : 2
       user config file : /home/leof/.condarc
 populated config files : /home/leof/miniforge3/.condarc
                          /home/leof/.condarc
          conda version : 23.3.1
    conda-build version : not installed
         python version : 3.9.16.final.0
       virtual packages : __archspec=1=x86_64
                          __cuda=12.1=0
                          __glibc=2.31=0
                          __linux=5.8.0=0
                          __unix=0=0
       base environment : /home/leof/miniforge3  (writable)
      conda av data dir : /home/leof/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/leof/miniforge3/pkgs
                          /home/leof/.conda/pkgs
       envs directories : /home/leof/miniforge3/envs
                          /home/leof/.conda/envs
               platform : linux-64
             user-agent : conda/23.3.1 requests/2.28.2 CPython/3.9.16 Linux/5.8.0-53-generic ubuntu/20.04.2 glibc/2.31
                UID:GID : 1019:1019
             netrc file : None
           offline mode : False

### Conda Config

```shell
$ conda config --show-sources
==> /home/leof/miniforge3/.condarc <==
channels:
  - conda-forge

==> /home/leof/.condarc <==
channels:
  - conda-forge


### Conda list

_No response_

### Additional Context

_No response_
leofang commented 1 year ago

cc: @jakirkham

barabo commented 1 year ago

I'm looking into an issue with one of the Cloudflare caches. It seems that it only caches .tar.bz2 files (we forgot to add .conda when we started putting conda files in the repo for conda-forge), so .conda downloads would likely be much slower.

jakirkham commented 1 year ago

Ah ok. This makes much more sense. Thanks Carl! 🙏

jakirkham commented 1 year ago

Recently saw an issue where a package, libnvjitlink, uploaded binaries for Linux and Windows at roughly the same time. However the Windows package mirrored much slower.

Edit: The Windows packages mentioned here took ~1.5hrs to mirror. This was the original build and this is the first CI build to get the package.


Screenshot showing Linux and Windows packages Screen Shot 2023-04-06 at 6 35 19 PM Note: Download count is `1` because I did a download from the web UI.
Searching for Linux package ``` $ conda search 'libnvjitlink[channel=conda-forge, subdir=linux-64]' Loading channels: done # Name Version Build Channel libnvjitlink 12.0.76 hcb278e6_0 conda-forge ```
Searching for Windows package ``` $ conda search 'libnvjitlink[channel=conda-forge, subdir=win-64]' Loading channels: done No match found for: conda-forge/win-64::libnvjitlink. Search: conda-forge/win-64::*libnvjitlink* PackagesNotFoundError: The following packages are not available from current channels: - conda-forge/win-64::libnvjitlink Current channels: - https://conda.anaconda.org/conda-forge/win-64 - https://conda.anaconda.org/conda-forge/noarch To search for alternate channels that may provide the conda package you're looking for, navigate to https://anaconda.org and use the search bar at the top of the page. ```
dholth commented 1 year ago

@barabo it makes sense to cache .conda but is it faster? Wouldn't the CDN sync usually be the first downloader of brand-new .conda packages from a cold cache? (But, we see plenty of downloads in the screenshot)

jakirkham commented 1 year ago

Windows is only downloaded once because I clicked the link in the web UI to download it. There were 0 downloads prior and no additional downloads until CDN sync completed ~1.5hrs after upload

jakirkham commented 1 year ago

Noticing this with cuda-nvcc number 1 on ppc64le (other packages uploaded at the same time are already available):

Package has been up for ~1.75hrs, but is not available from CDN (getting missing package errors when requesting it)

Screenshot: Screen Shot 2023-04-19 at 3 45 11 PM
leofang commented 1 year ago

FYI it took >60 mins to reflect a simple channel label change: https://github.com/conda-forge/admin-requests/pull/710#issuecomment-1520547532

beckermr commented 1 year ago

I want to chime in here that I am seeing CDN sync times on the conda-forge status page of over 15 minutes on a regular basis now.

barabo commented 1 year ago

@dholth and I are going to sync on this early next week. Something does seem to be going on - we'll get to the bottom of it.

jakirkham commented 1 year ago

Thanks Carl! 🙏

Please let us know if you need anything 🙂

dholth commented 1 year ago

We've shortened the cron interval so that updates should happen more frequently. Keep an eye on it and we'll see whether any other part of the pipeline is delayed.

barabo commented 1 year ago

The cron interval was every 10 minutes, which was fine when the job reliably ran in under 7 minutes. It recently started going over 10 for some runs, so we shortened it to 2.

I'm still looking at the logs to see if there's a way to speed it up.

jakirkham commented 1 year ago

Ok would be interested to know why the script is taking longer. AIUI there was some work in the past to cutdown the script runtime pretty significantly

jezdez commented 1 year ago

Looking at this, I'm uncertain if we've come to a conclusion, @barabo do you think we can close this?

jakirkham commented 1 year ago

Reading Carl's last comment, my (potentially incorrect) understanding is the cron job used for mirroring is starting to take longer. The cause for this is unknown and being investigated. So not yet fully resolved

dholth commented 1 year ago

I'm not too worried about it yet. The cron job used to take 6-7 minutes, and now it sometimes takes a little longer (which would have caused a 10 minute delay in the past); but sometimes it still runs in < 10 minutes. We should try to vacuum the databases at least.

jakirkham commented 1 year ago

If it is reliably mirroring at 10min intervals great, the issues mentioned above were when +1hr mirroring times were seen

m3vaz commented 1 year ago

@jakirkham @leofang we're still seeing these issues with packages that were posted 23 hours ago e.g. cuda-python.

barabo commented 1 year ago

Looking into the nvidia clone worker right now. It appears to have gotten stuck 18 hours ago and needed a restart. I believe it's done updating now.

m3vaz commented 1 year ago

@barabo Confirmed, I see the packages now.

Is there a way we could check on sync status for a given channel? (for when we hit similar issues in the future)

barabo commented 1 year ago

I believe you can do something like this to get a sense for when a channel subdir was last updated.

curl -Is https://conda.anaconda.org/nvidia/linux-64/repodata.json | grep last-modified
last-modified: Thu, 29 Jun 2023 18:35:12 GMT

It won't work if there are no new packages in linux-64 for that channel, but if you know that's what you're looking for it should be a good test.

m3vaz commented 1 year ago

Can we assume that the update job takes ~10 minutes and is run every 10 minutes (as referenced earlier in the issue)?

jakirkham commented 1 year ago

cc @adibbley (for awareness)

barabo commented 1 year ago

conda-forge syncs every 10 minutes, but I think the nvidia channel (and a few others) only sync every 20 minutes. We can look into increasing that cadence, if necessary.

jakirkham commented 1 year ago

We are seeing this issue with the nvidia channel again. Could someone please take a look?

cc @raydouglass

jezdez commented 1 year ago

@jakirkham we're looking into it

jezdez commented 10 months ago

This was resolved at the time, closing.