emorynlp / nlp4j-tokenization

Tokenize raw texts into tokens and sentences.
Other
6 stars 4 forks source link

Tokenizing dates ranges #6

Closed mzhai2 closed 8 years ago

mzhai2 commented 8 years ago

Date ranges like 1945-1949 should be tokenized to 1945 - 1949.

mzhai2 commented 8 years ago

you can match with ([0-9]+)-([0-9]+) \1 - \2

jdchoi77 commented 8 years ago

This feature will be added in the version 1.1.2. Thanks.