Wordseer / wordseer

The WordSeer text analysis tool, written in Flask.
http://wordseer.berkeley.edu/
40 stars 16 forks source link

resolve sentence tokenizer differences #269

Closed macfarlandian closed 9 years ago

macfarlandian commented 9 years ago

we use nltk in structureextractor, but stanford in stringprocessor, and sometimes they differ on sentence breaks, resulting in some data loss because stringprocessor discards all words after the first sentence if it detects more than one