Open VladimirAlexiev opened 9 years ago
There's a small difference between admin cats and hidden cats: https://en.wikipedia.org/wiki/Wikipedia:PROJCATS: "administration category..on article pages ..should be made a hidden category"
Nevertheless, I think Hidden_categories is the most precise way we got to find out which are admin cats.
Filter out maintenance (hidden) categories and don't emit them in the dataset. These categories are useful only to Wikipedia maintainers and are not useful for content consumers.
Unfortunately DBpedia does not extract classification coming from templates (transclusion), see #378. Most hidden cats are marked in that way, so:
I think extracting from templates will be very hard to implement. Other possible sources:
SQL
All classifications are available with SQL, eg from http://quarry.wmflabs.org:
Quarry has a timeout of 10 minutes, so isn't appropriate for large-scale querying. If you select SQL, you'd probably have to make a local copy of the DB
Wikipedia API
https://www.mediawiki.org/wiki/API:Categorymembers https://www.mediawiki.org/wiki/Special:ApiSandbox http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&format=json&cmtitle=Category%3AHidden%20categories&cmprop=title%7Ctype&cmtype=page%7Csubcat&cmlimit=100
You could try different return formats:
Notes