AmyOlex / Chrono

Parsing time normalizations from text.
GNU General Public License v3.0
15 stars 4 forks source link

Newlines are not delimiting temporal phrases. #59

Closed AmyOlex closed 6 years ago

AmyOlex commented 6 years ago

File ID051_clinic_148 has the following text: "my notes from December.

  1. Ulcerative colitis."

Where "December" and "2" are separated by a newline. However the program doesn't seem to recognize that. Need to review this new-line code in the temporal phrase extraction algorithm to figure out what is going on.

AmyOlex commented 6 years ago

I fixed this by re-writing the code for the getWhitespaceTokens() method in utils.py. It now also does sentence tokenization to identify the last word of each sentence. Now, when the temporal expression phrase extractor finds that a token is the last token in a sentence it ends the temporal phrase and starts a new one. This has eliminated the false positives like in the example provided above.