dbpedia / GSoC

Google Summer of Code organization
37 stars 27 forks source link

Golden standard and quality tool for DBpedia types #39

Closed MarianoRico closed 4 years ago

MarianoRico commented 5 years ago

Description

Several works from academia and industry exploit the "type" of DBpedia resources. This "type" is a class in the DBpedia ontology, like Person, Movie or Device. The "type" comes from (1) the Wikipedia infobox of the resource and (2) the mapping created by humans. Therefore, DBpedia extractors cannot assign a type to a resource when (1) the resource has not infobox in Wikipedia, or (2) the resource has an infobox not mapped. For many languages this lack of type reaches 50% of resources. Several experimental studies have tried to infer the type of a resource from the "connections" this resource has in the graph this resource belongs. For instance, [1] follows a statistic approach, and [2] follows a machine learning approach.
However, these approaches need a validation that is not simple: as DBpedia classes are in a hierarchy (Writer is a subclass of Person, Poet is a subclass of Writer, etc.) with up to 7 levels, the deeper levels use to have fewer resources. Therefore, the precision and recall of the "type predictors" must be validated per clase or, at least, per level.

[1] Paulheim, H., Bizer, C.: Type inference on noisy RDF data. ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_32

[2] Rico M., Santana-Pérez I., Pozo-Jiménez P., Gómez-Pérez A.: Inferring Types on Large Datasets Applying Ontology Class Hierarchy Classifiers: The DBpedia Case. EKAW 2018. LNCS, vol. 11313. Springer. https://doi.org/10.1007/978-3-030-03667-6_21

Goals

In order to achieve this validation we need a "golden standard" in which we have manually ensured the type of several resources for each type of the ontology. This "golden standard" should be built using ad hoc software tools. Ideally a web application.

Impact

Enhance the quality of the DBpedia. With this golden standard we could evaluate more easily the approaches to assign a type to a un-typed resource. Also could help us to assign alternative types to typed resources, for example, a more specific (deeper) type or, may be, an alternative type in another DBpedia class hierarchy branch.

Caveats

Ideal profile

Experience with Linked Data technologies (RDF, SPARQL), development of web applications.

Warm up tasks

Mentors

Mariano Rico

Keywords

golden standard, resource type validation.