Closed jasonswearingen closed 10 years ago
The corpus files are used by OpenNLP to split sentences. You can see corpus for other languages here: http://opennlp.sourceforge.net/models-1.5/
And if you want to create a model for yourself, here's the instructions from OpenNLP: http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.sentdetect.training
thanks for the info, i guess due to no NLP experience I won't be able to contribute though :(
@jasons-novaleaf NLP experience is not required. :) You can contribute by gathering articles of the language you choose. Split those articles into sentences via new line. It can then be used as a corpus. The instruction into building a corpus is easy, just follow the link I posted above.
i see there's a couple binary corpus files, but i don't see any info on how these are generated and/or how to add additional language support.