There was some confusion about the list of files which is checked by the
crawler. The original code was looping over the keys of a dict. Which
seemed to been an unlikely result from a database query. The PR
changed it to loop over an existing database query result list.
This change, however, resulted that the crawler was only looking at
repodata directories:
The reason the crawler actually has to loop over the keys of a dict is
that umdl reads all the directories and creates a dict of the (maybe 10)
newest files in that directory. This dict is then pickled and stored in
the database and read by the crawler.
Instead of simply reverting the commit which removed the loop over the
dict, this change keeps all other improvements and only changes the loop
to use the pickled dict again.
There was some confusion about the list of files which is checked by the crawler. The original code was looping over the keys of a dict. Which seemed to been an unlikely result from a database query. The PR
https://github.com/fedora-infra/mirrormanager2/pull/107
changed it to loop over an existing database query result list. This change, however, resulted that the crawler was only looking at repodata directories:
https://github.com/fedora-infra/mirrormanager2/issues/131
The reason the crawler actually has to loop over the keys of a dict is that umdl reads all the directories and creates a dict of the (maybe 10) newest files in that directory. This dict is then pickled and stored in the database and read by the crawler.
Instead of simply reverting the commit which removed the loop over the dict, this change keeps all other improvements and only changes the loop to use the pickled dict again.
Successfully tested in the staging environment.
Signed-off-by: Adrian Reber adrian@lisas.de