devpi / devpi

Python PyPi staging server and packaging, testing, release tool
https://doc.devpi.net
913 stars 136 forks source link

devpi-server not respecting --mirror-cache-expiry argument #1044

Closed epifeny closed 3 months ago

epifeny commented 4 months ago

I am experiencing an issue with the devpi-server where it appears not to respect the --mirror-cache-expiry argument. Despite setting the --mirror-cache-expiry to 43200 seconds (12 hours), the server continues to check pypi.org for updates instead of waiting for the specified cache expiry time.

Steps to Reproduce:

Start the devpi-server with the following command:

devpi-server --serverdir /opt/devpi-server/data --host 0.0.0.0 --port 3141 --mirror-cache-expiry 43200

Verify that the process is running with the expected arguments:

# cat /proc/1/cmdline
/usr/local/bin/python/usr/local/bin/devpi-server--serverdir/opt/devpi-server/data--host0.0.0.0--port3141--mirror-cache-expiry43200

Client Side:

  1. Set up a Python virtual environment and activate it:

    python3 -m venv devpi
    source devpi/bin/activate
  2. Install a package using the devpi-server:

    pip install --no-cache-dir --index-url https://devpi-server.<redacted>.net/devpi/dev --trusted-host devpi-server.<redacted>.net requests

Expected Behavior:

The devpi-server should wait for the allotted --mirror-cache-expiry time (12 hours) before checking pypi.org for updates.

Actual Behavior:

The server checks pypi.org for updates more frequently than the specified --mirror-cache-expiry time.

Environment:

devpi-server version: 6.12.0
Python version: 3.12.4
Operating System: Ubuntu 22.04.1
# uname -v
#17~22.04.1-Ubuntu SMP

Installed Packages:

# pip list
Package              Version
-------------------- -----------
anyio                4.4.0
argon2-cffi          23.1.0
argon2-cffi-bindings 21.2.0
attrs                23.2.0
beautifulsoup4       4.12.3
build                1.2.1
certifi              2024.7.4
cffi                 1.16.0
Chameleon            4.5.4
charset-normalizer   3.3.2
check-manifest       0.49
cmarkgfm             2024.1.14
defusedxml           0.7.1
devpi-client         7.0.3
devpi-common         4.0.4
devpi-server         6.12.0
devpi-web            4.2.2
docutils             0.21.2
h11                  0.14.0
httpcore             1.0.5
httpx                0.27.0
hupper               1.12.1
idna                 3.7
iniconfig            2.0.0
itsdangerous         2.2.0
lazy                 1.6
nh3                  0.2.18
packaging            24.1
packaging-legacy     23.0.post0
passlib              1.7.4
PasteDeploy          3.1.0
pip                  24.0
pkginfo              1.11.1
plaster              1.1.2
plaster-pastedeploy  1.0.1
platformdirs         4.2.2
pluggy               1.5.0
py                   1.11.0
pycparser            2.22
Pygments             2.18.0
pyproject_hooks      1.1.0
pyramid              2.0.2
pyramid_chameleon    0.3
python-dateutil      2.9.0.post0
readme_renderer      43.0
repoze.lru           0.7
requests             2.32.3
ruamel.yaml          0.18.6
ruamel.yaml.clib     0.2.8
setuptools           70.2.0
six                  1.16.0
sniffio              1.3.1
soupsieve            2.5
strictyaml           1.7.3
translationstring    1.4
urllib3              2.2.2
venusian             3.1.0
waitress             3.0.0
WebOb                1.8.7
wheel                0.43.0
Whoosh               2.7.4
zope.deprecation     5.0
zope.interface       6.4.post2

Server Logs:

2024-07-11 05:09:14,721 INFO  [req111] GET /devpi/dev/requests/
2024-07-11 05:09:14,730 INFO  [req112] GET /devpi/dev/requests
2024-07-11 05:09:14,796 INFO  [req113] GET /root/pypi/+f/707/61cfe03c773ce/requests-2.32.3-py3-none-any.whl
2024-07-11 05:09:14,814 INFO  [req114] GET /devpi/dev/charset-normalizer/
2024-07-11 05:09:14,821 INFO  [req115] GET /devpi/dev/charset-normalizer
2024-07-11 05:09:14,843 INFO  HTTP Request: GET https://pypi.org/simple/charset-normalizer/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:09:15,005 INFO  [req116] GET /root/pypi/+f/846/5322196c8b4d7/charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
2024-07-11 05:09:15,015 INFO  [req117] GET /devpi/dev/idna/
2024-07-11 05:09:15,021 INFO  [req118] GET /devpi/dev/idna
2024-07-11 05:09:15,034 INFO  HTTP Request: GET https://pypi.org/simple/idna/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:09:15,055 INFO  [req119] GET /root/pypi/+f/82f/ee1fc78add434/idna-3.7-py3-none-any.whl
2024-07-11 05:09:15,064 INFO  [req120] GET /devpi/dev/urllib3/
2024-07-11 05:09:15,070 INFO  [req121] GET /devpi/dev/urllib3
2024-07-11 05:09:15,085 INFO  HTTP Request: GET https://pypi.org/simple/urllib3/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:09:15,139 INFO  [req122] GET /root/pypi/+f/a44/8b2f64d686155/urllib3-2.2.2-py3-none-any.whl
2024-07-11 05:09:15,149 INFO  [req123] GET /devpi/dev/certifi/
2024-07-11 05:09:15,156 INFO  [req124] GET /devpi/dev/certifi
2024-07-11 05:09:15,170 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:09:15,201 INFO  [req125] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl

Question:

What am I doing wrong, or what did I not fully understand about the --mirror-cache-expiry argument?

Thank you for your assistance.

fschulze commented 4 months ago

The setting is per project and resets on restart of the server.

epifeny commented 4 months ago

The setting is per project and resets on restart of the server.

Thanks for your reply. Per project you mean individually for requests, charset_normalizer, idna, urllib3 and certifi? I mean, they all keep checking pypi.org even though they have been downloaded and haven't changed.

The server hasn't been restarted. It's the same running instance.

epifeny commented 4 months ago

The setting is per project and resets on restart of the server.

Here, you can see that I keep trying to install just certifi and the server keeps checking pypi.org over and over again

2024-07-11 05:32:59,406 INFO  [req144] GET /devpi/dev/certifi/
2024-07-11 05:32:59,416 INFO  [req145] GET /devpi/dev/certifi
2024-07-11 05:32:59,430 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:32:59,483 INFO  [req146] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl
2024-07-11 05:33:24,867 INFO  [req147] GET /devpi/dev/certifi/
2024-07-11 05:33:24,876 INFO  [req148] GET /devpi/dev/certifi
2024-07-11 05:33:24,891 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:33:24,932 INFO  [req149] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl
2024-07-11 05:33:29,345 INFO  [req150] GET /devpi/dev/certifi/
2024-07-11 05:33:29,354 INFO  [req151] GET /devpi/dev/certifi
2024-07-11 05:33:29,368 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:33:29,440 INFO  [req152] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl
2024-07-11 05:33:33,475 INFO  [req153] GET /devpi/dev/certifi/
2024-07-11 05:33:33,483 INFO  [req154] GET /devpi/dev/certifi
2024-07-11 05:33:33,498 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:33:33,537 INFO  [req155] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl
fschulze commented 4 months ago

Could you try the same directly on /root/pypi/certifi instead of going through /devpi/dev/certifi which seems to inherit from root/pypi?

epifeny commented 4 months ago

Could you try the same directly on /root/pypi/certifi instead of going through /devpi/dev/certifi which seems to inherit from root/pypi?

It's the same issue

2024-07-11 05:56:49,852 INFO  [req168] GET /root/pypi/certifi/
2024-07-11 05:56:49,861 INFO  [req169] GET /root/pypi/certifi
2024-07-11 05:56:49,880 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:56:49,919 INFO  [req170] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl
2024-07-11 05:56:58,512 INFO  [req171] GET /root/pypi/certifi/
2024-07-11 05:56:58,522 INFO  [req172] GET /root/pypi/certifi
2024-07-11 05:56:58,541 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:56:58,615 INFO  [req173] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl
2024-07-11 05:57:17,277 INFO  [req174] GET /root/pypi/certifi/
2024-07-11 05:57:17,287 INFO  [req175] GET /root/pypi/certifi
2024-07-11 05:57:17,304 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:57:17,357 INFO  [req176] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl
2024-07-11 05:57:30,727 INFO  [req177] GET /root/pypi/certifi/
2024-07-11 05:57:30,736 INFO  [req178] GET /root/pypi/certifi
2024-07-11 05:57:30,754 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-11 05:57:30,806 INFO  [req179] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl
epifeny commented 4 months ago

I decided to recreate everything from scratch and can't reproduce the issue I had.

epifeny commented 3 months ago

I actually got this happening again. I have the server running for nearly a week, and all the packages are again showing 304. devpi server keeps checking pypi.org for every package.

2024-07-23 11:02:54,643 INFO  [req42959] GET /root/pypi/certifi/
2024-07-23 11:02:54,659 INFO  [req42960] GET /root/pypi/certifi
2024-07-23 11:02:54,702 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-23 11:02:54,763 INFO  [req42961] GET /root/pypi/+f/c19/8e21b1289c2ab/certifi-2024.7.4-py3-none-any.whl

This is the process

# ps auxfww | grep -v grep | grep devpi-server
root           1  0.3  0.2 4596456 1129620 ?     Ssl  Jul17  34:20 /usr/local/bin/python /usr/local/bin/devpi-server --serverdir /opt/devpi-server/data --host 0.0.0.0 --port 3141 --mirror-cache-expiry 43200
fschulze commented 3 months ago

It certainly looks like it doesn't happen right away. I see the same behavior on https://m.devpi.net which is running for a while now. I can't reproduce it locally yet.

fschulze commented 3 months ago

To quickly reproduce the problem use --mirror-cache-expiry=10.

With devpi-web it is even worse, as it causes a HEAD request for each version of the package.

Could you test with this change:

diff --git a/server/devpi_server/mirror.py b/server/devpi_server/mirror.py
index f29a6d5df..8d37fa73b 100644
--- a/server/devpi_server/mirror.py
+++ b/server/devpi_server/mirror.py
@@ -795,8 +795,8 @@ class MirrorStage(BaseStage):
                 f"timeout after {self.timeout} seconds while getting data for {project!r}")
         except self.UpstreamNotModified as e:
             if links is not None:
-                self.keyfs.tx.on_commit_success(partial(
-                    self.cache_retrieve_times.refresh, project, e.etag))
+                # immediately update the cache
+                self.cache_retrieve_times.refresh(project, e.etag)
                 return self.SimpleLinks(links)
             if e.etag is None:
                 threadlog.error(
epifeny commented 3 months ago

It's been around 8 hours since I started the server, not 304 Not Modified yet. Just a few expected 200 OK. I will continue to monitor this and see when I start getting 304 Not Modified, maybe it's related to what I have defined in the --mirror-cache-expiry 43200. I'll keep you posted.

epifeny commented 3 months ago

To quickly reproduce the problem use --mirror-cache-expiry=10.

With devpi-web it is even worse, as it causes a HEAD request for each version of the package.

Could you test with this change:

diff --git a/server/devpi_server/mirror.py b/server/devpi_server/mirror.py
index f29a6d5df..8d37fa73b 100644
--- a/server/devpi_server/mirror.py
+++ b/server/devpi_server/mirror.py
@@ -795,8 +795,8 @@ class MirrorStage(BaseStage):
                 f"timeout after {self.timeout} seconds while getting data for {project!r}")
         except self.UpstreamNotModified as e:
             if links is not None:
-                self.keyfs.tx.on_commit_success(partial(
-                    self.cache_retrieve_times.refresh, project, e.etag))
+                # immediately update the cache
+                self.cache_retrieve_times.refresh(project, e.etag)
                 return self.SimpleLinks(links)
             if e.etag is None:
                 threadlog.error(

Yes. I will try it out but it's not gonna be easy since I'm running it via Kuberenetes and pushing a change into a file inside a pod is going to be challenging. I would rather you test it, push a fix with an update to the devpi-server package that I can install and test.

epifeny commented 3 months ago

I started the server with --mirror-cache-expiry 5:

# ps auxfww | grep -v grep | grep devpi
root           1  115  0.1 4648436 616040 ?      Ssl  20:56   2:50 /usr/local/bin/python /usr/local/bin/devpi-server --serverdir /opt/devpi-server/data --host 0.0.0.0 --port 3141 --mirror-cache-expiry 5

with the change in place:

# grep -C 10 'immediately update the cache' /usr/local/lib/python3.12/site-packages/devpi_server/mirror.py
                    self._update_simplelinks_in_future(newlinks_future, project, lock.defer()))
            if links is not None:
                threadlog.warn(
                    "serving stale links for %r, getting data timed out after %s seconds",
                    project, self.timeout)
                return self.SimpleLinks(links, stale=True)
            raise self.UpstreamError(
                f"timeout after {self.timeout} seconds while getting data for {project!r}")
        except self.UpstreamNotModified as e:
            if links is not None:
                # immediately update the cache
                self.cache_retrieve_times.refresh(project, e.etag)
                return self.SimpleLinks(links)
            if e.etag is None:
                threadlog.error(
                    "server returned 304 Not Modified, but we have no links")
                raise
            # should not happen, but clear ETag and try again
            self.cache_retrieve_times.expire(project, etag=None)
            return self.get_simplelinks_perstage(project)
        except (self.UpstreamNotFoundError, self.UpstreamError) as e:

at this time it started serving:

2024-07-23 20:56:36,599 INFO  Serving on http://0.0.0.0:3141

and just a few moments after, I started getting 304:

2024-07-23 20:56:43,881 INFO  HTTP Request: GET https://pypi.org/simple/requests/ "HTTP/1.1 200 OK"
2024-07-23 20:56:44,218 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 200 OK"
2024-07-23 20:56:44,465 INFO  HTTP Request: GET https://pypi.org/simple/charset-normalizer/ "HTTP/1.1 200 OK"
2024-07-23 20:56:45,205 INFO  HTTP Request: GET https://pypi.org/simple/urllib3/ "HTTP/1.1 200 OK"
2024-07-23 20:56:45,404 INFO  HTTP Request: GET https://pypi.org/simple/idna/ "HTTP/1.1 200 OK"
2024-07-23 20:57:07,094 INFO  HTTP Request: GET https://pypi.org/simple/requests/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:07,299 INFO  HTTP Request: GET https://pypi.org/simple/charset-normalizer/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:07,574 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:07,693 INFO  HTTP Request: GET https://pypi.org/simple/urllib3/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:07,835 INFO  HTTP Request: GET https://pypi.org/simple/idna/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:20,289 INFO  HTTP Request: GET https://pypi.org/simple/requests/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:20,523 INFO  HTTP Request: GET https://pypi.org/simple/idna/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:20,676 INFO  HTTP Request: GET https://pypi.org/simple/urllib3/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:20,874 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-23 20:57:21,073 INFO  HTTP Request: GET https://pypi.org/simple/charset-normalizer/ "HTTP/1.1 304 Not Modified"
fschulze commented 3 months ago

With just 5 seconds the 304s are expected as is indicated by the time difference. Please try with the patch, but with the default expiry.

epifeny commented 3 months ago

You're absolutely correct! I intended to set it to 5 minutes but didn't remember that this variable is in seconds, not minutes. I've now set it to 120 seconds to do a quick test. The cache expires every 120 seconds, so devpi-server does a recheck on the metadata as expected.

2024-07-24 06:07:36,870 INFO  HTTP Request: GET https://pypi.org/simple/requests/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:07:37,146 INFO  HTTP Request: GET https://pypi.org/simple/idna/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:07:37,373 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:07:37,568 INFO  HTTP Request: GET https://pypi.org/simple/urllib3/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:07:37,760 INFO  HTTP Request: GET https://pypi.org/simple/charset-normalizer/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:09:46,239 INFO  HTTP Request: GET https://pypi.org/simple/requests/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:09:46,553 INFO  HTTP Request: GET https://pypi.org/simple/urllib3/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:09:46,739 INFO  HTTP Request: GET https://pypi.org/simple/charset-normalizer/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:09:47,087 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:09:47,216 INFO  HTTP Request: GET https://pypi.org/simple/idna/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:11:47,437 INFO  HTTP Request: GET https://pypi.org/simple/requests/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:11:47,825 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:11:48,079 INFO  HTTP Request: GET https://pypi.org/simple/urllib3/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:11:48,237 INFO  HTTP Request: GET https://pypi.org/simple/charset-normalizer/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:11:48,644 INFO  HTTP Request: GET https://pypi.org/simple/idna/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:13:58,135 INFO  HTTP Request: GET https://pypi.org/simple/requests/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:13:58,279 INFO  HTTP Request: GET https://pypi.org/simple/idna/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:13:58,598 INFO  HTTP Request: GET https://pypi.org/simple/certifi/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:13:58,851 INFO  HTTP Request: GET https://pypi.org/simple/charset-normalizer/ "HTTP/1.1 304 Not Modified"
2024-07-24 06:13:59,309 INFO  HTTP Request: GET https://pypi.org/simple/urllib3/ "HTTP/1.1 304 Not Modified"