For some large wikis, including enwiki, commonswiki and wikidatawiki, it currently takes several days to process the linktarget table. This is a problem: If the daily pipeline job hasn’t finished within 24 hours, Toolforge will kill and restart the process. (Which is good; we do want to have a watchdog in place in case the pipeline gets stuck).
To make this go faster, use a different join order. This will save time because pages without wikidata IDs will get dropped earlier than now. Also, we're currently re-sorting the contents of linktarget, even though the SQL dump is already sorted by primary key.
For some large wikis, including
enwiki
,commonswiki
andwikidatawiki
, it currently takes several days to process thelinktarget
table. This is a problem: If the daily pipeline job hasn’t finished within 24 hours, Toolforge will kill and restart the process. (Which is good; we do want to have a watchdog in place in case the pipeline gets stuck).To make this go faster, use a different join order. This will save time because pages without wikidata IDs will get dropped earlier than now. Also, we're currently re-sorting the contents of
linktarget
, even though the SQL dump is already sorted by primary key.