Debian / debiman

debiman generates a static manpage HTML repository out of a Debian archive
Apache License 2.0
193 stars 46 forks source link

Testing version doesn't sync with archive after migration #181

Open nc7s opened 1 week ago

nc7s commented 1 week ago

After dh-shell-completions 0.0.3 migrated to testing for a while (migrated on 22 Sep, problem found on 18 Oct), manpages.debian.org still has its testing version at 0.0.2. Now that 0.0.4 was uploaded and manpages.d.o version are now in sync (testing 0.0.3, unstable 0.0.4), I suspect that only uploads trigger updates, not migrations.

stapelberg commented 1 week ago

Hey, thanks for your report.

You’re right that something seems off here, but your suspicion is not correct: debiman does not know about uploads or migrations, it always goes through the list of packages currently in the Debian archive.

However, I think there is a bug in cache invalidation that I have now tracked down based on this timeline:

This is what the Debian package tracker lists:

This is what the debiman logfiles say, annotated for clarity with the resulting state on disk:

TZ=Europe/Zurich journalctl --root=2024-09-19 --since 2024-09-15 -u debiman --grep dh_shell_completions | cat
# rendering both versions because 0.0.2 migrated to testing
Sep 16 05:03:05 ex622 run-debiman.bash[1701967]: 2024/09/16 05:03:05 render.go:296: /srv/man/www/unstable/dh-shell-completions/dh_shell_completions.1.en.html.gz invalidated by /srv/man/www/testing/dh-shell-completions/dh_shell_completions.1.en.gz
Sep 16 05:03:05 ex622 run-debiman.bash[1701967]: 2024/09/16 05:03:05 rendermanpage.go:322: rendering "/srv/man/www/unstable/dh-shell-completions/dh_shell_completions.1.en.html.gz"
Sep 16 05:03:05 ex622 run-debiman.bash[1701967]: 2024/09/16 05:03:05 rendermanpage.go:322: rendering "/srv/man/www/testing/dh-shell-completions/dh_shell_completions.1.en.html.gz"
# -rw-r--r-- 1 root root 2,0K 2024-09-09 00:38 testing/dh_shell_completions.1.en.gz
# -rw-r--r-- 1 root root 4,8K 2024-09-16 05:03 testing/dh_shell_completions.1.en.html.gz
# -rw-r--r-- 1 root root 4,8K 2024-09-16 05:03 unstable/dh_shell_completions.1.en.html.gz

# rendering both versions because 0.0.3 entered unstable
Sep 17 05:03:25 ex622 run-debiman.bash[1813534]: 2024/09/17 05:03:25 render.go:296: /srv/man/www/testing/dh-shell-completions/dh_shell_completions.1.en.html.gz invalidated by /srv/man/www/unstable/dh-shell-completions/dh_shell_completions.1.en.gz
Sep 17 05:03:25 ex622 run-debiman.bash[1813534]: 2024/09/17 05:03:25 rendermanpage.go:322: rendering "/srv/man/www/testing/dh-shell-completions/dh_shell_completions.1.en.html.gz"
Sep 17 05:03:25 ex622 run-debiman.bash[1813534]: 2024/09/17 05:03:25 rendermanpage.go:322: rendering "/srv/man/www/unstable/dh-shell-completions/dh_shell_completions.1.en.html.gz"
# -rw-r--r-- 1 root root 2,0K 2024-09-16 21:43 unstable/dh_shell_completions.1.en.gz
# -rw-r--r-- 1 root root 5,5K 2024-09-17 05:03 unstable/dh_shell_completions.1.en.html.gz
# -rw-r--r-- 1 root root 5,5K 2024-09-17 05:03 testing/dh_shell_completions.1.en.html.gz

# NOTE: The log for 2024-09-22 does not contain any mention of dh_shell_completions!
# most likely cause: 
# 1. debiman extracts the manpage to testing/dh_shell_completions.1.en.gz with modtime 2024-09-16 21:43
# 2. because the mod time of the raw manpage (2024-09-16 21:43) is older than the HTML version (2024-09-17 05:03), debiman assumes the HTML version is up to date and does not need to be re-generated.

# rendering both versions because 0.0.4 entered unstable
Oct 19 23:03:32 ex622 run-debiman.bash[1822969]: 2024/10/19 23:03:32 render.go:296: /srv/man/www/testing/dh-shell-completions/dh_shell_completions.1.en.html.gz invalidated by /srv/man/www/unstable/dh-shell-completions/dh_shell_completions.1.en.gz
Oct 19 23:03:32 ex622 run-debiman.bash[1822969]: 2024/10/19 23:03:32 rendermanpage.go:322: rendering "/srv/man/www/testing/dh-shell-completions/dh_shell_completions.1.en.html.gz"
Oct 19 23:03:32 ex622 run-debiman.bash[1822969]: 2024/10/19 23:03:32 rendermanpage.go:322: rendering "/srv/man/www/unstable/dh-shell-completions/dh_shell_completions.1.en.html.gz"

This is the state on disk:

% ls -hltr /srv/man/www/unstable/dh-shell-completions/ && head /srv/man/www/unstable/dh-shell-completions/VERSION 
total 20K
-rw-r--r-- 1 root root 2,0K 2024-10-19 16:38 dh_shell_completions.1.en.gz
-rw-r--r-- 1 root root    5 2024-10-19 23:02 VERSION
-rw-r--r-- 1 root root 3,5K 2024-10-19 23:03 index.html.gz
-rw-r--r-- 1 root root 5,5K 2024-10-19 23:03 dh_shell_completions.1.en.html.gz
0.0.4#                                                                                                                                                                                                   

% ls -hltr /srv/man/www/testing/dh-shell-completions/ && head /srv/man/www/testing/dh-shell-completions/VERSION        
total 20K
-rw-r--r-- 1 root root 2,0K 2024-09-16 21:43 dh_shell_completions.1.en.gz
-rw-r--r-- 1 root root    5 2024-09-22 05:00 VERSION
-rw-r--r-- 1 root root 3,5K 2024-09-22 05:03 index.html.gz
-rw-r--r-- 1 root root 4,8K 2024-10-19 23:03 dh_shell_completions.1.en.html.gz
0.0.3#                                                                                                                                                                                                   

So the problem consists of multiple parts:

  1. We override the modtime of the raw manpage when extracting to what’s stored in the archive (i.e. the modtime of the uploader): https://github.com/Debian/debiman/blob/4afba3a1fef8fc4215d59c13926082ac1001a987/cmd/debiman/download.go#L409
  2. Invalidating other versions of a manpage updates the HTML modtime, but re-uses the content (as an optimization).
  3. This breaks the assumption we make here: If the HTML version is more recent than the raw manpage, it must also contain the contents of that raw manpage: https://github.com/Debian/debiman/blob/4afba3a1fef8fc4215d59c13926082ac1001a987/cmd/debiman/render.go#L250

So, what can we do to fix the issue?

I’m not sure yet which path I like most. Maybe option 2 deserves a shot, and if it turns out to be too hard for some reason, we can resort to option 3.

I can kick off a run with a forced full re-rendering to get the current manpage archive fixed (will take a few days to complete and propagate, though).

stapelberg commented 1 week ago

I can kick off a run with a forced full re-rendering to get the current manpage archive fixed (will take a few days to complete and propagate, though).

Looks like this was a bit quicker than expected: the corrected version now seems to be live.

nc7s commented 1 week ago

Thanks for the detailed analysis. Some rough thoughts: