fedora-infra / mirrormanager2

Rewrite of the MirrorManager application in Flask and SQLAlchemy
https://mirrormanager.fedoraproject.org
GNU General Public License v2.0
65 stars 49 forks source link

Tracker for fullfilelist-powered UMDL #206

Closed puiterwijk closed 7 years ago

puiterwijk commented 7 years ago

@adrianreber was working on an improved UMDL that uses fullfilelist. This ticket is just to track that, so interested people can follow it.

adrianreber commented 7 years ago

Running final tests for the fullfiletimelist-* support.

The goal of the fullfiletimelist-* based update-master-directory-listing (umdl) is to be able to 'scan' the master mirror without stat(). The main reason is that the filesystem trees umdl is working have become huge and to scan all those files produces lot's of IOPS just for detecting that nothing changed.

With the help of https://pagure.io/quick-fedora-mirror/ new files (fullfiletimelist-*) are created for each mirror category.

This new umdl approach relies on the correctness of those files. This means that if those files are wrong, MirrorManager will have the wrong state. But as it removes (almost) the necessity to do any stat() it it much faster. It is between 5 and 10 times faster.

The current implementation still needs to stat the directories to see if they are readable but there is a feature request (https://pagure.io/quick-fedora-mirror/issue/40) to include the readable information in the fullfiletimelist-* files.

To calculate the repomd.xml checksums and to read the -CHECKSUM files of the ISOs there is still disk access necessary (could be also via https or rsync) but only if the directory ctime actually changed (according to fullfiletimelist-*).

The fullfiletimelist-* files are detected automatically and if they do not exist umdl falls back to directory walking.

adrianreber commented 7 years ago

This is now running on the Fedora production systems.