gnosygnu / xowa

xowa offline wiki application
Other
375 stars 41 forks source link

commons categories #294

Open desb42 opened 5 years ago

desb42 commented 5 years ago

As I started to write up this issue, github indicated some other outstanding issues viz #178 and #171 that seem to be related to do with categories (and other things)

My particular issue is with the page commons.wikimedia.org/wiki/Commons:Featured_pictures which for me looks like: commons_err Tracking this down it turns out to be a bit of wikitext specifically

{{PAGESINCATEGORY:Featured pictures on Wikimedia Commons|files|R}}

This as the template suggests tries to lookup Featured pictures on Wikimedia Commons in the categories - but lo, there are no categories databases (the root one being commons.wikimedia.org-xtn.category.core.xowa) For the record this is common downloaded 2017-06

So in an effort to see if I could resolve this (and as I had started to play with wiki dumps of 2018-11-01), I thought I would try to build commons with categories. The files I downloaded were

commonswiki-20181101-categorylinks.sql.gz      5,946,324,847
commonswiki-20181101-image.sql.gz             28,245,435,700
commonswiki-20181101-pages-articles.xml.bz2    8,509,219,937
commonswiki-20181101-page_props.sql.gz         1,655,398,845

these expanded to:

commonswiki-20181101-categorylinks.sql        55,714,829,882
commonswiki-20181101-image.sql               128,148,214,022
commonswiki-20181101-pages-articles.xml       87,656,888,559
commonswiki-20181101-page_props.sql            3,943,397,466

Given the discussion in #268 I add xowa.api.bldr.wiki.import.cat_link_db_max = 3600; to the end of my xowa.gfs file and fired up a build.

9h 4m 2s later I ended up with the following category databases

commons.wikimedia.org-xtn.category.core.xowa              108,175,360
commons.wikimedia.org-xtn.category.link-db.001.xowa     3,952,017,408
commons.wikimedia.org-xtn.category.link-db.002.xowa     3,999,875,072
commons.wikimedia.org-xtn.category.link-db.003.xowa     4,006,154,240
commons.wikimedia.org-xtn.category.link-db.004.xowa     4,012,261,376
commons.wikimedia.org-xtn.category.link-db.005.xowa     4,009,504,768
commons.wikimedia.org-xtn.category.link-db.006.xowa     4,005,167,104
commons.wikimedia.org-xtn.category.link-db.007.xowa     4,009,791,488
commons.wikimedia.org-xtn.category.link-db.008.xowa     4,003,123,200
commons.wikimedia.org-xtn.category.link-db.009.xowa     4,010,426,368
commons.wikimedia.org-xtn.category.link-db.010.xowa     4,003,315,712
commons.wikimedia.org-xtn.category.link-db.011.xowa     4,006,424,576
commons.wikimedia.org-xtn.category.link-db.012.xowa     4,007,567,360
commons.wikimedia.org-xtn.category.link-db.013.xowa     4,005,003,264
commons.wikimedia.org-xtn.category.link-db.014.xowa     2,556,116,992

Which is beyond the 10 attached table limit (even with a doubling of the size of link databases) However the template problem has been fixed by this commons_good

So this issue can be solved with a catagories db; however not the table limit (with an ever growing commons) I'm now going to rebuild with cat_link_db_max doubled again!

gnosygnu commented 5 years ago

Nice analysis. Just to confirm / answer a few points:

Hope the doubling worked!

desb42 commented 5 years ago

Yes, the doubling worked, I used

xowa.api.bldr.wiki.import.cat_link_db_max = 7200;

and I now have

commons.wikimedia.org-xtn.category.core.xowa           108,175,360
commons.wikimedia.org-xtn.category.link-db.001.xowa  7,970,942,976
commons.wikimedia.org-xtn.category.link-db.002.xowa  8,037,404,672
commons.wikimedia.org-xtn.category.link-db.003.xowa  8,033,693,696
commons.wikimedia.org-xtn.category.link-db.004.xowa  8,031,965,184
commons.wikimedia.org-xtn.category.link-db.005.xowa  8,032,849,920
commons.wikimedia.org-xtn.category.link-db.006.xowa  8,033,021,952
commons.wikimedia.org-xtn.category.link-db.007.xowa  6,580,187,136

Which is nicely below the attach table limit