dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
859 stars 269 forks source link

Mount Everest elevation missing - due to Wikidata reference? #767

Open AJKellmann opened 1 month ago

AJKellmann commented 1 month ago

Issue validity

The issue still persists, here is the link: http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Mount+Everest&revid=&format=n-triples&extractors=custom

Error Description

When I wrote a SPARQL query to find the highest mountain in the world, I realized that Mount Everest was missing from the results. The issue appears to be due to the lack of an http://dbpedia.org/ontology/elevation property in the extracted data for Mount Everest.

The Wikipedia page contains the correct height, but it is referenced from Wikidata rather than being directly included in the Infobox as is common for other mountains.

Screenshot Wikipedia

This might also affect other entries in Wikipedia that load their values from Wikidata instead of stating it explicitly.

Pinpointing the source of the error

The issue was discovered in the SPARQL endpoint at http://dbpedia.org/sparql. Here is the SPARQL query that highlights the problem:

SELECT DISTINCT ?mountain ?height WHERE { ?mountain http://dbpedia.org/ontology/elevation ?height. ?mountain a http://schema.org/Mountain. } ORDER BY DESC(?height) LIMIT 10

Expected result: Mount Everest should appear with an elevation of 8848.86 meters. Actual result: Mount Everest does not appear in the list, indicating the elevation data is missing.

Details

Wrong triples / missing data: There is no http://dbpedia.org/ontology/elevation triple for the resource Mount Everest in the current DBpedia data.

Expected corrected RDF outcome: http://dbpedia.org/resource/Mount_Everest http://dbpedia.org/ontology/elevation 8848.86 (xsd:double)