Open nichtich opened 7 years ago
RePEc Short-ID is a very nice and convincing case for domains. , I've done a bit of research on this, and came across several obstacles for other cases:
Thus, the domain is meaningful in two different ways:
a) as a formal restriction (e.g., you want to focus on GND persons only, which means that we have not P227, but P227 for instances of human. This relates to case 4 above). Since there are many variations of this kind of restrictions, depending on the property/KOS, I suppose it should be considered "out of scope" here, or implemented as an extension for a few selected and manually configured use cases.
b) as a general background for comparisons (e.g., in venn diagrams) in order to give an idea of "how much is covered". Perhaps it would make sense to have a separate datastructure for "basic sets" (with title, size and description), and another data structure "base2prop" to relate these "basic sets" to properties or intersections of properties. This could be extended by everybody who is interested via pull requests. The "size" could be just a arbitrary estimate, or it could optionally be expressed as a sparql query which computes the estimate as a wd query result (e.g twice the number of humans with occupation "economist", rounded to full thousands), in order to keep up with the growth of Wikidata. Unfortunately, due to a), all of this would leave out a lot of interesting use cases ...
I just read my original statement: "total number of humans" is 7.5 billion, that's not the number to compare with. To answer
1) Total numbers can be counted such as SELECT (COUNT(?x) as ?c) { ?x wdt:P31/wdt:P279* wd:Q2221906 } without timeout
2) Yes, so let's start with easy cases
3) Same as 2) to better be provided as intellectual guess.
4a) and 4b) relates to indirect mappings only. For direct mappings (Wikidata-to-KOS) there are always two numbers
I'd start with the size of full KOS and with mapping candidates in Wikidata expressable as SPARQL query because both can be queried from Wikidata. See https://www.wikidata.org/wiki/Q51044 ald property quantity (P1114) for an example.
Completely agree with your action plan. Wow, the geographic location query took less than 10sec. Amazing, that blazgraph is so much more optimized (perhaps using some cached statistics) than Fuseki. Re. 4 (partial KOS is only relevant for indirect mappings) I suppose you misunderstood my intent. I think there is value in comparing the ammount of e.g., gndo:DifferentiatedPersons to wd:Q5 instances, or of gndo:CorporateBody to Q43229 instances. But that cannot be attached to an item in WD as quantity - which is elegant and brings a huge advantage over custom config files. So let's start with the easy cases.
e.g. P2428 has domain human, so add the total number of humans for comparision.