Not identifying MinuteOfHour Entities

AmyOlex / Chrono

Parsing time normalizations from text.

GNU General Public License v3.0

15 stars 4 forks source link

Not identifying MinuteOfHour Entities #35

Closed AmyOlex closed 6 years ago

AmyOlex commented 6 years ago

We are missing most of the Minute of Hour entities in the cancer corpus.

AmyOlex commented 6 years ago

This was due to the format of hh:mm that is frequent in the clinical texts. Our methods were looking specifically for the hh:mm:ss format and were not catching the more common hh:mm format. I edited the code to also identify the hh:mm format, or the h:mm format, and now we are getting over .90 F1 for this entity. I think this is also messing up the HourOfDay as it is not getting all the subintervals it is supposed to.