Open sshivam95 opened 5 months ago
For Linking the full WDC dataset, we need to take care about the <RESTRICTION>
tag. Since the classes in wikidata are gibberish, eg.:
etc. These does not make sense to humans, therefore we are interested in their rdfs:label
property.
Step | Description |
---|---|
Gather Wikidata Classes | Gather all the Wikidata classes with rdfs:label using a SPARQL query |
Gather WDC Dataset Classes | Gather all the classes from each WDC dataset |
Link Dataset Classes | Link the dataset classes with Wikidata classes |
Store Linked Classes | Keep these linked classes for further checking |
Automate Config Creation | Automate the creation of config files for LIMES linking based on the KG with only triples |
For step Gather Wikidata Classes,
SELECT ?class ?classLabel
WHERE
{
SERVICE wikibase:label { bd:serviceParam wikibase:language {list of languages}. }
{SELECT DISTINCT ?class WHERE {
?s wdt:P31 ?class .
} OFFSET 1000 LIMIT 100}
}
Languages for the WDC labels are extracted.
[x] Job for getting Wikidata classes for those languages
Update:
For linking, to avoid complexity based on the large number of files, combining 99% of files in each format was done only for linking. This reduced applying limes for linking on, eg. $265$ datasets to only $15$ datasets.
wiki_class owl:equivalentClass KG_class .
Combining 99% of named KGs in a dataset to avoid creating number of Limes config. Same as https://github.com/dice-group/WHALE/issues/9#issuecomment-2194912993
Next step to #9