dice-group / WHALE

0 stars 0 forks source link

Link individual small graphs with wikidata usig LIMES #10

Open sshivam95 opened 5 months ago

sshivam95 commented 5 months ago

Next step to #9

sshivam95 commented 5 months ago

For Linking the full WDC dataset, we need to take care about the <RESTRICTION> tag. Since the classes in wikidata are gibberish, eg.:

etc. These does not make sense to humans, therefore we are interested in their rdfs:label property.

Working pipeline:

Step Description
Gather Wikidata Classes Gather all the Wikidata classes with rdfs:label using a SPARQL query
Gather WDC Dataset Classes Gather all the classes from each WDC dataset
Link Dataset Classes Link the dataset classes with Wikidata classes
Store Linked Classes Keep these linked classes for further checking
Automate Config Creation Automate the creation of config files for LIMES linking based on the KG with only triples
sshivam95 commented 5 months ago

For step Gather Wikidata Classes,

sshivam95 commented 5 months ago
sshivam95 commented 4 months ago

Update:

sshivam95 commented 4 months ago

For linking, to avoid complexity based on the large number of files, combining 99% of files in each format was done only for linking. This reduced applying limes for linking on, eg. $265$ datasets to only $15$ datasets.

sshivam95 commented 4 months ago

Combining 99% of named KGs in a dataset to avoid creating number of Limes config. Same as https://github.com/dice-group/WHALE/issues/9#issuecomment-2194912993