SWAT4HCLS / Biohackathon-SWAT4HCLS-2023

0 stars 0 forks source link

[Pitch]: Creating life-science subsets from Wikidata #1

Open andrawaag opened 1 year ago

andrawaag commented 1 year ago

Short title

Extracting sensible subsets from Wikidata

Pitch

Wikidata is the linked-data repository of the Wikimedia foundation. It is a generic knowledge graph that since its onset in 2012 has grown into a substantial knowledge graph. One of the issues with Wikidata, except for being often too big to be handled, is that Wikidata is a moving target. This means that Wikidata is constantly in flux which can be a concern for reproducibility when Wikidata is used.

Wikidata however provides regular dumps. Currently, methods exist to extract subsets from the regular wikidata dumps. This allows for more manageable topical knowledge graphs from wikidata, which could allow more complex querying currently possible on Wikidata, itself.

During the hackathon, we would like to describe the boundaries and subsequently extract those subsets from Wikidata.

Expertise needed

Familiarity with SPARQL, ShEx, RDF and Wikidata