Allow pre-tokenised text

Similar to https://github.com/TakeLab/spacy-udpipe/issues/13, it would be nice to have an option to disable the tokenizer in some way and to use tokens (list of string) directly as input to the rest of the pipeline. For instance, in spaCy, we can easily swap out the tokenizer:

nlp.tokenizer = nlp.tokenizer.tokens_from_list

This would be helpful!

It would also be great if this could be used together with the aforementioned issue (https://github.com/TakeLab/spacy-udpipe/issues/13) so that you can pass pretokenized, presegmented text.

TakeLab / spacy-udpipe

Allow pre-tokenised text #18