Debian / debiman

debiman generates a static manpage HTML repository out of a Debian archive
Apache License 2.0
188 stars 46 forks source link

Different versions of localized manpage in unstable / testing #110

Open toddy15 opened 5 years ago

toddy15 commented 5 years ago

Hi,

the manpage machine-id.5 has been translated to German and is available as localized manpage from manpages.debian.org. However, although the versions in unstable and testing are identical, the text of the manpage differs.

Unstable: https://manpages.debian.org/unstable/manpages-de/machine-id.5.de.html

Source file: | machine-id.5.de.gz (from manpages-de 2.10-2)
Source last updated: | 2019-01-14T09:41:17Z
Converted to HTML: | 2019-01-28T16:14:36Z

Testing: https://manpages.debian.org/testing/manpages-de/machine-id.5.de.html

Source file: | machine-id.5.de.gz (from manpages-de 2.10-2)
Source last updated: | 2019-01-14T09:41:17Z
Converted to HTML: | 2019-01-28T16:14:36Z
stapelberg commented 5 years ago

debiman re-uses rendered content for efficiency, so my first guess would be that the relevant logic did not work correctly in this particular case.

In particular, debiman re-uses content when the mtime of the HTML is after the mtime of the manpage.

stapelberg@manziarly:/srv/manpages.debian.org/www/testing/manpages-de$ TZ=UTC ls -l machine-id*                 
-rw-r--r-- 1 manpages manpages 3423 Jan 14 09:41 machine-id.5.de.gz
-rw-r--r-- 1 manpages manpages 8697 Feb  7 04:25 machine-id.5.de.html.gz

Unfortunately, the log files of the corresponding debiman run have already been rotated away.

Looking at https://snapshot.debian.org/archive/debian/20190114T151156Z/pool/main/m/manpages-de/manpages-de_2.10-2_all.deb’s data.tar.xz, we see:

% TZ=UTC tar tvf data.tar.xz | grep machine-id.5
-rw-r--r-- root/root      3432 2019-01-14 09:41 ./usr/share/man/de/man5/machine-id.5.gz

The changes file (from https://tracker.debian.org/news/1020812/accepted-manpages-de-210-2-source-into-unstable/) contains Date: Mon, 14 Jan 2019 10:41:17 +0100, which is before the message timestamp of Mon, 14 Jan 2019 10:04:55 +0000, so the mtime in the .deb file seems to be correct.

My current theory is that the mtime of machine-id.5.de.html.gz was updated because a change in another package invalidated it, and that raced with debiman getting to see the updated machine-id.5.de.gz.

Unfortunately, we can’t just fake the mtime to fix this race, because it is used for rsync'ing the files to our static serving infrastructure. Instead, we should probably make the re-use code use the “last updated” timestamp encoded in the file, not the file mtime.