bennofs / wdumper

Tool for generating filtered Wikidata RDF exports
https://tools.wmflabs.org/wdumps
MIT License
36 stars 5 forks source link

Get a dump of everything but scientific articles #2

Open wetneb opened 4 years ago

wetneb commented 4 years ago

Thanks for this great tool!

I would be interested in generating a dump of all wikidata items, except those which have P31:Q13442814. It's not clear to me if this is doable yet?

WolfgangFahl commented 4 years ago

Scholarly articles are at https://www.wikidata.org/wiki/Q13442814 https://tools.wmflabs.org/scholia/ has statistics about the amount of triples you'd save on excluding them. It would be only 3% of all triples ... - Still a feature to filter out certain entities might be worthwhile.

wetneb commented 4 years ago

It would be only 3% of all triples ...

Are you sure about this? Where do you see this figure? Scholia does announce 11,186,800,006 Wikidata triples but I don't see a figure for the number of triples for scientific articles? I expect that to be much more than 3%…

WolfgangFahl commented 4 years ago

35718600 | Scholarly articles it says... - yes you are right the number of triples with all properties will be higher than 3% then.

danbri commented 3 years ago

Any progress on this?

danbri commented 3 years ago

Some more stats links:

ScholarlyArticle and Astronomical object are interesting subsets, both to extract and keep, or to exclude, depending on purpose.