dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
856 stars 269 forks source link

Wrong lat/long for many entries in geo_coordinates*_en.ttl #489

Open mazieres opened 7 years ago

mazieres commented 7 years ago

I've found many errors in the lat/long reported in both geo_coordinates_en.ttl and geo_coordinates_mappingbased_en.ttl (2016-04).

For instance:

It seems that sometimes the conversion from compass direction format to signed degrees format fails.

$ grep "<http://dbpedia.org/resource/Western_Australia>" geo_coordinates_mappingbased_en.ttl
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "26.0"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "121.0"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Western_Australia> <http://www.georss.org/georss/point> "26.0 121.0".
$ grep "<http://dbpedia.org/resource/Morocco>" geo_coordinates_en.ttl
<http://dbpedia.org/resource/Morocco> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
<http://dbpedia.org/resource/Morocco> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "33.53333333333333"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Morocco> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "7.583333333333333"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Morocco> <http://www.georss.org/georss/point> "33.53333333333333 7.583333333333333".

I can't measure it precisely but my guess is that a few thousands records are corrupted this way.

jimkont commented 7 years ago

Thanks for the report @mazieres this is a duplicate of #106. We are currently working on replacing the DBpedia mapping language with RML and such configurations should be enabled then

VladimirAlexiev commented 7 years ago

@mazieres: #016 explains that the mapping Infobox_Australian_place needs to set default (constant) latDir "S" since the default in the code is "N". Can you check what infobox is used by Morocco, and whether the problem is the same?

@jimkont There is a a PR at #106. Won't it be better to merge this PR so we can fix these mappings, rather than wait for a new technology to be adopted?

m1ci commented 4 years ago

@VladimirAlexiev @mazieres The problem seems to be fixed in the latest DBpedia releases. Here is what we get in the 2020.04.01 release for https://databus.dbpedia.org/dbpedia/generic/geo-coordinates/

<http://dbpedia.org/resource/Western_Australia> <http://www.georss.org/georss/point> "-26.0 121.0" .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "-26.0"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "121.0"^^<http://www.w3.org/2001/XMLSchema#float> .

The values seems to be correct, i.e. -26.0 and 121.0.

Can we close the issue?

Anyways, before closing the issue we need to write a test for this.

VladimirAlexiev commented 4 years ago

@m1ci have you checked Casablanca?