dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
851 stars 270 forks source link

The dbo:spouse / dbp:spouse information should be extracted as an array #715

Open pkleef opened 3 years ago

pkleef commented 3 years ago

Issue validity

See: http://dief.tools.dbpedia.org/server/extraction/en/extract?title=Joe+Biden&revid=&format=trix&extractors=custom and http://dbpedia.org/resource/Joe_Biden

Error Description

Looking at http://dbpedia.org/resource/Joe_Biden we can see several bad triple patterns:

dbo:spouse
    dbr:Jill_Biden
    dbr:1972_United_States_Senate_election_in_Delaware
    dbr:Neilia_Hunter_Biden

dbp:spouse
    1966-08-27 (xsd:date)
    1972-12-18 (xsd:date)
    1977-06-17 (xsd:date)
    dbr:Jill_Biden
    dbr:Neilia_Hunter_Biden(en)
    died (en)

It looks like the extractor cartridge for Person does not parse the spouse information as an array.

Also the dbr:1972_United_States_Senate_election_in_Delaware also indicates bad parsing.

Pinpointing the source of the error

Details

I believe the code should be changed to use the same pattern as for the dbo:termPeriod e.g.

dbo:spouse 
     dbr:Joe_Biden__Spouse__1
     dbr:Joe_Biden__Spouse__2
     dbr:Joe_Biden__Spouse__3
jlareck commented 2 years ago

It is not completely clear for me, so how should the triples look like? Should we leave those triples:

dbr:Joe_Biden    dbo:spouse       dbr:Jill_Biden
dbr:Joe_Biden    dbo:spouse       dbr:Neilia_Hunter_Biden

? Also, as I see, we should add this kind of triples for each of the Joe_Biden spouses:

dbr:Joe_Biden    dbo:spouse      dbr:Joe_Biden__Spouse__1
dbr:Joe_Biden    dbo:spouse      dbr:Joe_Biden__Spouse__2

And do we need to remove:

dbp:spouse
    1966-08-27 (xsd:date)
    1972-12-18 (xsd:date)
    1977-06-17 (xsd:date)
    died (en)

and

dbo:spouse
    dbr:1972_United_States_Senate_election_in_Delaware

? And we should not use dbp:spouse at all, am I right?