dbpedia / mappings-tracker

This project is used for tracking mapping issues in mappings.dbpedia.org
9 stars 6 forks source link

excessive use of intermediate nodes in French mappings #8

Open VladimirAlexiev opened 9 years ago

VladimirAlexiev commented 9 years ago

Many French mappings use intermediate nodes excessively.

Eg http://mappings.dbpedia.org/index.php/Mapping_fr:Infobox_Ville_de_Serbie would produce:

<place>
  takePlace [a Place; name "nom division1"; sharingOut <division1>];
  takePlace [a Place; name "nom division2"; sharingOut <division2>];
  politicalLeader [a PoliticalFunction; 
     mayor <mayor>; activeYearsStartYear; activeYearsEndYear]
  demographics [a Demographics; populationTotal 123; year "2014"]
  wholeArea [a Area; value 12345].
      <place> populationTotal 123; populationAsOf "2014".

There are previousDemographics and agglomerationDemographics that could be useful,but these are not yet used

      <place> areaLand 12345.
VladimirAlexiev commented 9 years ago

Did they just make up takePlace, sharingOut, demographics, demographicsAsOf ????

jplu commented 9 years ago

Hi,

The property takePlace can be indeed replaced by something like isPartOf yes.

The property sharingOut is there to define the divisions (province, district, etc.) where the city belongs so it's not a type. It's an intermediate node mainly to associate the name of those divisions to the division itself. But I'm agree that the name is not very representative of what it represents. Let's take this example :

{{Infobox Ville de Serbie
| nom de division          = Province
| division                 = [[Serbie centrale]]
| nom de division2         = District
| division2                = [[Belgrade (district)|Belgrade]]
}}

You can see that there is strings and internal links to Wikipedia. If you don't use a intermediate node, how do you know that Province is associated to the resource represented by [[Serbie centrale]] and not [[Belgrade (district)|Belgrade]] ?

'The property demographics is there to associate a date to a population. Because you can get many population, one per year (2011, 2012, 2013, etc.) for a populated place.

For the others properties you mentioned it's the same explanation than for demographics, they are there to associate a year to a number. You can use the previous example and extends it to those properties for a representation of the problem.

Hope my explanation are clear, in any case don't hesitate to ask.

VladimirAlexiev commented 9 years ago
jplu commented 9 years ago

Agree they are not associated to the city, but I gonna push the example deeper :

{{Infobox Ville de Serbie
| nom de division          = Province
| division                 = Serbie centrale
| nom de division2         = District
| division2                = Belgrade
}}

Now there is only strings, so you cannot go through the links to get what kind of division is "Belgrade".

For wholeArea some infoboxes models associate also others properties to an Area than just the value. I don't have examples for this and the population in mind right now sorry, but I know when I was studying the french Wikipedia dump I met many of those cases.

VladimirAlexiev commented 9 years ago

there is only strings, so you cannot go through the links to get what kind of division is "Belgrade".

Now I understand.

So the example should become:

<place>
    part [a Place; foaf:name "nom division1"; dc:type <division1>]

I've checked in the downloadable ontology file, and there are 2 more:

:sharingOut a owl:DatatypeProperty , rdf:Property ;
    rdfs:label "sharing out"@en ;
    rdfs:domain :PopulatedPlace ;
    rdfs:range xsd:string ;
:sharingOutArea a owl:DatatypeProperty , rdf:Property ;
    rdfs:label "sharing out area"@en ;
    rdfs:domain :PopulatedPlace ;
    rdfs:range xsd:string ;
:sharingOutName a owl:ObjectProperty , rdf:Property ;
    rdfs:label "sharing out name of a settlement"@en ;
    rdfs:domain :Settlement ;
    rdfs:range :PopulatedPlace ;

Can you please show an example in Turtle of potential data using these properties? (Like the example in my first comment).

sharingOutArea and sharingOutName are not used, except in http://mappings.dbpedia.org/index.php/Mapping_eo:Geokesto: libera1_tipo -> sharingOutName. "libera typo" means "free type", so I think they mean a string, and because sharingOutName is an objectProperty they'll get nothing. They should use "type" (if the target is a page), or "dc:type" (if the target is a string).

wholeArea some infoboxes models associate also others properties to an Area than just the value. I don't have examples for this and the population in mind right now sorry

I checked at http://fr.dbpedia.org/sparql:

  select * {[] dbpedia-owl:wholeArea [?p ?x] filter (?p!=rdf:type && ?p!=dbpedia-owl:value)}

Indeed, some use min, max, rank.

Now please research the uses of, and add an appropriate comment to the class Demographics.

jplu commented 9 years ago

But why call it "sharingOut" instead of dc:type, which has been defined 20 years ago and is widely used?

You have totally right, it is not the good name.

They should use "type"

You mean this property ? http://mappings.dbpedia.org/index.php/OntologyProperty:Type

sharingOutArea and sharingOutName are not used

Ideed, they are not used, so we can delete them. And sorry I don't remember why I have created those ones.

I've commented http://mappings.dbpedia.org/index.php/OntologyClass:Area, see if you agree

Very good yes.

it'd be better to use a new data prop "sortOrder" of type xsd:integer

I am totally agree, yes.

Now please research the uses of

Once I remember yes I will give example.

add an appropriate comment to the class Demographics.

Is-it ok for you ? http://mappings.dbpedia.org/index.php/OntologyClass:Demographics

VladimirAlexiev commented 9 years ago

You don't need to remember, use SPARQL. (I first found the 3 most often used props, then discovered the remaining one "rank")

select * {?x a dbpedia-owl:Demographics; ?y ?z 
  filter (!(?y in (rdf:type, dbpedia-owl:year, dbpedia-owl:populationTotal)))} limit 200

I've edited the description to be more specific, please take a look

VladimirAlexiev commented 9 years ago

Yes, when I don't give a prefix I mean dbo:. I.e. by "type" I mean "dbo:type" or exactly the prop that you have given a link for, above

jplu commented 9 years ago

After playing a bit with SPARQL I didn't found any usage of those properties, so we can certainly remove them and modify the mapping for something better.

I've edited the description to be more specific, please take a look

It's ok !

VladimirAlexiev commented 9 years ago

Which are "those properties"?

Sorry, this issue has gotten too big :-( Probably best to split it

jplu commented 9 years ago

Sorry by "those properties" I mean "sharingOutArea" and "sharingOutName".

VladimirAlexiev commented 9 years ago

If unused, definitely kill them

jplu commented 9 years ago

Done !

VladimirAlexiev commented 9 years ago

Instead of takePlace, you should use geographic structure properties with reasonable names and that are used by others. I have not researched this question and AFAIK nobody has defined such best practice: maybe you can do that?