emorynlp / nlp4j-tokenization

Tokenize raw texts into tokens and sentences.
Other
6 stars 4 forks source link

Local #2

Closed amit-deshmane closed 8 years ago

amit-deshmane commented 8 years ago

Updated tokenization to capture offset (start, end) information. Other projects using tokenize()/segmentize() will fail, since they no longer return List/ List<List>.

Please check and merge and also update other projects

jdchoi77 commented 8 years ago

I'm done merging; just one question though. What's the use of the Index class? Isn't it just a boxer of the primitive int?