HeidelTime / heideltime

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.
GNU General Public License v3.0
343 stars 67 forks source link

HeidelTime demo detects age references as date #29

Closed bisoldi closed 9 years ago

bisoldi commented 9 years ago

I copied and pasted the text of the article found here:

http://www.cnn.com/2015/07/15/europe/germany-nazi-death-camp-verdict/?iid=ob_article_footer_expansion&iref=obnetwork

Into the demo and tried it with all 4 "Document Types". It detected a reference to age (written in an informal manner) as references to a Date.

For example, the sentence "Groening, who's in his 90s, ..." detected "90's" as TYPE: DATE and VALUE: 199.

When written more formally (as one would expect from a professional, mainstream news source) as "Groening, who is over 90 years old" (or something similar), it doesn't detect anything (as I assume it shouldn't), however that's too common of a way of expressing age to be left alone I would think.

JannikStroetgen commented 9 years ago

Hi, thanks for opening this issue. You are right that stuff such as ages of persons should not be extracted. We will add a negative rule to catch expressions such as "in his/her \d\ds".

Thanks again, we'll keep you in the loop.

bisoldi commented 9 years ago

Great, thanks! We're just starting to get into implementing HeidelTime in our application and will post any other issues as they come up. I will let you close the issue as you see fit.