AmyOlex / Chrono

Parsing time normalizations from text.
GNU General Public License v3.0
15 stars 4 forks source link

Duplicate Hour of Day entities in the Chrono entity list. #36

Closed AmyOlex closed 6 years ago

AmyOlex commented 6 years ago

After fixing the MinuteOfHour I expected the HourOfDay to also improve, but it didn't. The reason was the sub-intervals were not correct. While we were predicting 51 Hour entities we had predicted 91 sub-intervals. I thought this was very very odd. We need to figure out why we are able to predict more sub-intervals than entities.

AmyOlex commented 6 years ago

Ok, I figured this one out! We are first parsing the time phrase "09:00 AM" with the "buildChronoHourOfDay" method, which identifies the 09 as the hour, and then we parse it again with the AMPM to identify the "AM" part, but this method also re-parses for an hour entity and adds another hour entity. Since our sub-interval building method only assumes ONE entity type per phrase it just grabs the first one and links it properly to the MinuteOfHour entity and leaves the second unlinked. I fixed this by adding the "flags" array to be checked by the BuildAMPM method. If an hour already exists then it will skip adding another one. However, we are still not getting all the Hour's correct. The spans are spot on, but we are missing the AMPM of Day links.