Reserve an entry for custom tokenizer( include POS tagging)

Hi, I search corpus manager and find this project. It looks very promising .

I am mainly working with Chinese text, coquery doesn't work well with Chinese now, and I afraid it would never play well with Chinese in future.

And I am here just to give a recommendation for future - reserve an entry(plugin?) for custom tokenizer( include POS tagging) .

We usually don't use Stanford CoreNLP, it is not convenient and less accuracy . All popular Chinese tokenizers I have seen usually have two method tokenize and tokenize_with_postag , there is no way to tag words in a tokenized text (unless just use the most frequently postag for a word, but that is wrong way). That is different with English, there is a project Spacy(https://github.com/explosion/spacy) , which split tokenize and postagging in two pipeline steps, make Chinese integration much more difficult.

Hope I can use this project in future, wish it be better and better .

gkunter / coquery

Reserve an entry for custom tokenizer( include POS tagging) #287