Closed Flashfire42 closed 7 months ago
Every ignore pattern adds computational overhead because it needs to be checked for every URL. So the global igset should generally be kept as small as possible (and I think could use some cleanup).
Some quick statistics for the first 9 months of this year from my log files: 73237 jobs, 38872 (53.1 %) of them recursive. The top ignore sets are 10698 (14.6 %) badvideos
, 8572 (11.7 %) blogs
, 3616 (4.9 %) notweets
, 1996 (2.7 %) forums
.
In other words, over 85 % of jobs never get igsets beyond global
. Some of them might retrieve URLs that would get ignored, but still, the vast majority would just be slowed down by adding these igsets.
This is not a fix for your lazy job submission practice.
Would it be a wise idea to roll the bad videos and some of the blogs igsets into the global ignores? They are the 2 most commonly applied igsets and some of these patterns could surely be rolled into the Global Ignores?