AmyOlex / Chrono

Parsing time normalizations from text.
GNU General Public License v3.0
15 stars 4 forks source link

To include ordinal characters in DayOfMonth or not? #58

Closed AmyOlex closed 6 years ago

AmyOlex commented 6 years ago

The Gold standard files are inconsistent with including the ordinal characters in the span for DayOfMonth entities. For example, in file ID011_clinic_031 in the phrase "seen on October 7th." the gold standard only identifies the "7" where Chrono returns "7th". However, in other files the Gold standard returns the full "7th". For example, in file ID051_clinic_148 for the phrase "until March 8 or 9th" returns "9th" and not "9". I will be emailing Egoitz on this to figure out which is correct and which is not. When I made this change in the code an ran it on the testing files we ended up doing worse because we were not returning the full ordinal value as the day.

AmyOlex commented 6 years ago

Egoitz said the Span should include the ordinal characters, so "7th" is the correct raw token, BUT, the value must be "7". In looking back the only gold standard file with this issue is ID011_clinic_031. I'm emailing to let him know.