Open acidjunk opened 8 years ago
@clusterfudge -> could you elaborate on the label "ready"? If I understand correctly some translation stuff (e.g. Tokenizer) is needed for other languages. I did read about it in #5 It seems that OleanderStemmingLibrary already has support for dutch, docs are very limited to just a class reference. Anything I can do to work/test on this?
hey @acidjunk , the label is an artifact of the new task tracking integration we're using (waffle.io). This issue had actually slipped past me.
I'm going to be working on some docs for contributing new language ports/proofs-of-concept for Adapt in the coming week or so, and will share them with you for review.
You are, however, right on track as to what needs to be done. The only part of adapt (thus far) that's english-specific is the tokenizer. Adding a new language would involve verification that the tokenizer (at least partly) works with the punctuation of the new language, then providing working samples in that language. There may also be some effort to forcing utf-8 encoding on all the code, though I haven't seen any of that yet.
Could you point me to some docs?
@clusterfudge -> all the "language" tickets are labelled "ready". Can you unlabel them? (not enough permissions to label anything)
done!
I read some comments in other issues about translating stuff in tokenizer.
I'm happy to help; just looking for an easy starting point.