dbpedia / mappings-tracker

This project is used for tracking mapping issues in mappings.dbpedia.org
9 stars 6 forks source link

fix Parent places from frwiki (remove takePlace, sharingOut) #29

Open VladimirAlexiev opened 9 years ago

VladimirAlexiev commented 9 years ago

@jplu: Derived from this comment in #8.

Example1: http://fr.wikipedia.org/w/index.php?title=Antioche&action=edit:

 | division                 = [[Région méditerranéenne]]
 | nom de division          = [[Régions de Turquie|Région]]
 | division2                = [[Hatay]]
 | nom de division2         = [[Provinces de Turquie|Province]]
 | division3                = [[Région méditerranéenne]]
 | nom de division3         = [[Districts de Turquie|District]]

I think that "division3" being the same as "division" is a mistake in this infobox.

Example2: http://fr.wikipedia.org/w/index.php?title=Ankara&action=edit

 | division                 = [[Région de l'Anatolie centrale]]
 | nom de division          = [[Régions de Turquie|Région]]
 | division2                = Ankara
 | nom de division2         = [[Provinces de Turquie|Province]]

Notice that unlike the previous example, division2 is a literal (string) not resource (link). But there is http://fr.wikipedia.org/wiki/Ankara_(province), so the fact that it's not linked is a defect in this infobox.

This is mapped in a strange way in 40-50 maps (fr and eo), e.g. see http://mappings.dbpedia.org/index.php?title=Mapping_fr:Infobox_Ville_de_Turquie&action=edit

        {{IntermediateNodeMapping
        | nodeClass = Place
        | correspondingProperty = takePlace
        | mappings =
           {{PropertyMapping | templateProperty = division | ontologyProperty = sharingOut }}
           {{PropertyMapping | templateProperty = nom division | ontologyProperty = name }}
        }}

(repeated for division2,3,4)

This results in the following triples, eg see

<Ankara> a Place; takePlace <Ankara__1>, <Ankara__2>.
<Ankara__1> a Place; sharingOut "Région de l'Anatolie centrale". # name <Régions de Turquie> missing because it's a Datatype property
<Ankara__2> a Place; sharingOut "Ankara". # name <Provinces de Turquie> missing because it's a Datatype property

<Antioche> a Place; takePlace <Antioche__1>, <Antioche__2>, <Antioche__3>.
<Antioche__1> a Place; sharingOut "Région méditerranéenne". # name <Régions de Turquie> missing because it's a Datatype property
<Antioche__2> a Place; sharingOut "Hatay". # name <Provinces de Turquie> missing because it's a Datatype property
# <Antioche__3> same as <Antioche__1>, this is a mistake in the infobox

The problems are:

I don't think that many parent places would be missing from frwiki, to justify creating IntermediateNodes for them.

Instead we want triples like this, using widely-used properties in other maps:

<Ankara> a Place; region <Région_méditerranéenne>, province <Ankara_(province)>.
<Antioche> a Place; region <Région_méditerranéenne>, province <Hatay>.

Also:

rtroncy commented 9 years ago

Jumping in this discussion thread. I fully agree that the properties takePlace and sharingOut should be deleted. My question is why don't you re-use the well-defined geonames properties for this purpose, namely gn:parentADM1, gn:parentADM2, gn:parentADM3 and gn:parentADM4 ... see also the ontology. The semantics of those properties is to represent the administrative division of places

jplu commented 9 years ago

Ok, I'm totally agree too. So to get what we want, if I'm not mistaken, we have to replace all the intermediate nodes by a conditional mapping depending of the value of "nom de division" ?

For the properties you mention @rtroncy I think adding them as "owl:equivalentProperty" to the corresponding DBpedia ones can be ok, no ?

VladimirAlexiev commented 9 years ago

@jplu: conditional mapping depending of the value of "nom de division" won't work, since that's not a general type (eg Province) but a specific page (eg [[Provinces de Turquie]]). You need to hardcode, eg

division  -> region
division2 -> province
division3 -> district # but check there's such prop!

The hardcoding will depend on template, eg for Italy "division2" may map to something different.

@rtroncy: I agree, region/province/district are lang-dependent, while ADM1/ADM2/ADM3 precisely say which level. But we need to consider domain/range carefully. Please make an issue for it here. That's a big issue since thousands of mappings use region/province etc. Alexandru Todor (I can't find his Github handle) got some bots that can help. As I write in #8, we don't yet have a "Parent Places Mapping" page, do you want to start one?

jplu commented 9 years ago

I don't think assuming this mapping as :

division  -> region
division2 -> province
division3 -> district 

can give something proper, because in the infobox model it is written nowhere that those properties must be in a proper order as defined in Geonames. I can totally put "province" in "division" and "region" in "division3".

http://fr.wikipedia.org/wiki/Mod%C3%A8le:Infobox_Ville_de_Turquie#Param.C3.A8tres

VladimirAlexiev commented 9 years ago

@jplu: ok, you'd need something like:

ConditionalMapping
  Case "nom de division" includes "province" -> 
    PropertyMapping templateProperty=division | ontologyProperty=province
  Case "nom de division" includes "région" -> 
    PropertyMapping templateProperty=division | ontologyProperty=region
ConditionalMapping
  Case "nom de division2" includes "province" -> 
    PropertyMapping templateProperty=division2 | ontologyProperty=province
  Case "nom de division2" includes "région" -> 
    PropertyMapping templateProperty=division2 | ontologyProperty=region

Cautions:

But you should also investigate the actual uses with a query like this:

select * {?x <http://fr.dbpedia.org/property/nomDeDivision> ?y} limit 1000

There's a great variety there and I'm not sure what you can do...

For some reason, all are collapsed to nomDeDivision: nomDeDivision2,3,4 returns nothing. I'll post a bug: https://github.com/dbpedia/extraction-framework/issues/314

VladimirAlexiev commented 9 years ago

A more useful analysis query:

select ?div (count(*) as ?c) {?x <http://fr.dbpedia.org/property/nomDeDivision> ?div} 
group by ?div order by desc(?c)

There's too many values to write ConditionalMappings for

VladimirAlexiev commented 9 years ago

The enwiki mapping takes a simpler more feasible approach: http://mappings.dbpedia.org/index.php?title=Mapping_en:Infobox_settlement&action=edit

    {{ PropertyMapping | templateProperty = subdivision_name | ontologyProperty = country }}
    {{ PropertyMapping | templateProperty = subdivision_name1 | ontologyProperty = isPartOf }}
    {{ PropertyMapping | templateProperty = subdivision_name2 | ontologyProperty = isPartOf }}
    {{ PropertyMapping | templateProperty = subdivision_name3 | ontologyProperty = isPartOf }}

This results in just two levels: country, and everything else:

dbr:Ankara country dbr:Turkey;
  isPartOf dbr:Central_Anatolia_Region, dbr:Ankara_Province.
jplu commented 9 years ago

I think even using "isPartOf" for all could be enough because we don't need the country property as it is created by a constant mapping.

VladimirAlexiev commented 9 years ago

Yes, mapping division->country would be wrong because it's never country

rtroncy commented 9 years ago

Catching up with this thread. I agree that the systematic use of isPartOf (for Province, Region, District, etc.) would be good enough, in particular if the infobox model does not enforce/recommend a clear semantics as being done by Geonames. @VladimirAlexiev Do you still want me to create an issue? About what precisely?

VladimirAlexiev commented 9 years ago

@rtroncy Yes, the issue is that some maps use isPartOf, while others use province, region, and whatever other parent-place properties there are (I think sth like "in the territorial unit of").

@jplu Would be nice to write this up on a page

jplu commented 9 years ago

@VladimirAlexiev I create a discussion page on the mapping wiki ? If yes according to which mapping/property page ?

jimkont commented 9 years ago

Why not make the isPartOf more systematic, i.e. in all mappings, and add secondary mappings on province/region/etc.?

VladimirAlexiev commented 9 years ago

@jplu there's one editorial page "What's in a Name" listed at the homepage. After it, add a redlink "Mapping Places Best Practices" then create the page from the redlink. You need to research and list all place relations! (search "range place" but more is needed).

Agree with @jimkont, country/province/region can be made subprops isPartOf, and used when we know the level of the part.

jplu commented 9 years ago

@VladimirAlexiev I have created the link on the homepage and the query below list all the properties which have a range or a domain to dbo:Place :

SELECT DISTINCT ?prop WHERE {{?prop rdfs:range dbpedia-owl:Place}UNION{?prop rdfs:domain dbpedia-owl:Place}}

Do you want me I put this list in the page ? Or just the ones we talked about ?

VladimirAlexiev commented 9 years ago

Hi @jplu!

select distinct ?prop {{?x ?prop ?y. ?x a dbo:Place} union {?x ?prop ?y. ?y a dbo:Place}}
jplu commented 9 years ago

I have delated the properties takePlace and sharingOut. Now instead of takePlace we use isPartOf. All the mappings for nom de division have been removed.

VladimirAlexiev commented 9 years ago

@jplu

Then resolve. Cheers!