dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
855 stars 269 forks source link

Dbpedia 0-99 date results #439

Open JanTerlecki opened 8 years ago

JanTerlecki commented 8 years ago

Hi,

I've encountered a bug in dbpedia service, dates between 0-99 A.D are mapped to 19(0-99), example here: http://dbpedia.org/page/Nero

I haven't dug into dbpedia code, but I'm assuming that this is javascript bug, it also happened in my code:

var date  = new Date("32");
//Thu Jan 01 2032 00:00:00 GMT+0100 (Central European Standard Time)

In my example javascript assumes that this is 2032, just wanted to let you know about this bug. I've fixed this issue in my code by adding:

if (year > 0 && year < 100){
date.setFullYear(year);
}

I'm not very experienced with Javascript yet but hope it helps.

vedmathai commented 8 years ago

Seems to be something to do with the plain-text date format that's written in the infobox. For Jesus Christ it is showing 01-01-0001 as the birthdate, while the info-box in wikipedia says 4 BC (this actually I can't explain) but the death date shows correctly. For the Roman emperors, on wikipedia, it is written as for example, 24 January 41 and this is mapped to 1941. It probably has something to do with the date-parser. And this format not being supported. I am on it.

jimkont commented 8 years ago

for Jesus, in the revision we parsed, the date is taken from the persondata template. The data from the infobox says 'birth_date = 7–2 BC' which we failed to parse

{{Persondata
...
| DATE OF BIRTH     = 1st century BC
... 
}}

btw, thank @JanTerlecki for reporting! Based on the data you explored, do you have any idea how we can quantify this and see how many times we got the date wrongly / correctly?

vedmathai commented 8 years ago

Can we answer

Based on the data you explored, do you have any idea how we can quantify this and see how many times we got the date wrongly / correctly?

with this.