HeidelTime / heideltime

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.
GNU General Public License v3.0
342 stars 67 forks source link

wrong sentence boundary detection avoids matching #50

Open JannikStroetgen opened 7 years ago

JannikStroetgen commented 7 years ago

JAN. 27, 2017 is a date. two sentences extracted avoid the matching of JAN. 27, 2017 as temporal expression:

kno10 commented 7 years ago

In noticed this in my unit tests, too.

But I don't think sentence splitting is Heideltimes responsibility (except if you use the "NO" tagger).

Stanford tagger seems to get this right, I don't know about TreeTagger. Since my use case is using Stanford anyway, I did not bother looking into allowing the matching cross sentence boundaries.

JannikStroetgen commented 7 years ago

We actually manipulate the pos output of the treetagger for a couple of languages to get rid of incorrect sentence boundaries. I will address this soon.