Similar to https://github.com/TakeLab/spacy-udpipe/issues/13, it would be nice to have an option to disable the tokenizer in some way and to use tokens (list of string) directly as input to the rest of the pipeline. For instance, in spaCy, we can easily swap out the tokenizer:
Similar to https://github.com/TakeLab/spacy-udpipe/issues/13, it would be nice to have an option to disable the tokenizer in some way and to use tokens (list of string) directly as input to the rest of the pipeline. For instance, in spaCy, we can easily swap out the tokenizer:
This would be helpful!
It would also be great if this could be used together with the aforementioned issue (https://github.com/TakeLab/spacy-udpipe/issues/13) so that you can pass pretokenized, presegmented text.