Closed MojoJolo closed 9 years ago
TextTeaser uses OpenNLP to split sentences. And OpenNLP requires a corpus for it to split sentences properly.
Here's a list of languages with corpus from OpenNLP: http://opennlp.sourceforge.net/models-1.5/
Limited language are supported by the sentence detector of OpenNLP. It's good if we can have other language supported too. E.g. Russian, Chinese, Japanese.
Here's an instruction to create a sentence detector corpus: http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.sentdetect.training
TextTeaser uses OpenNLP to split sentences. And OpenNLP requires a corpus for it to split sentences properly.
Here's a list of languages with corpus from OpenNLP: http://opennlp.sourceforge.net/models-1.5/
Limited language are supported by the sentence detector of OpenNLP. It's good if we can have other language supported too. E.g. Russian, Chinese, Japanese.
Here's an instruction to create a sentence detector corpus: http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.sentdetect.training