Build my own corpus by a specific language with its part of speech

I think if i want to use a specific tokenizer (for processing language such as CJK) to build corpus with part of speech, i should implement my own tokenstream and set it to CorpusData object and call encode method to format it. And with the help of decode function in https://github.com/PolMine/polmineR i can perform CQP on my own corpus . (then it is only require install cwbtools and polmineR without need the help from http://cwb.sourceforge.net/devs.php)

I want to know if i am right ?

And if the lexer use to parse CQP can also match the “pos” i defined by my own specific tokenizer ?

PolMine / cwbtools

Build my own corpus by a specific language with its part of speech #41