We have been using a fast TokenBuffer API to speed up for various tokenizers in WordTokenizers.jl.
Referring to #141 #140, I think it might be beneficial to extend the TokenBuffer API for Documents and Corpus that TextAnalysis.jl offers (excluding NGramDocument and TokenDocument).
This can then be used to improve the performance for preprocessing.jl.
Edit: This could also serve as a solution for #74 #76
We have been using a fast TokenBuffer API to speed up for various tokenizers in WordTokenizers.jl.
Referring to #141 #140, I think it might be beneficial to extend the TokenBuffer API for Documents and Corpus that TextAnalysis.jl offers (excluding NGramDocument and TokenDocument). This can then be used to improve the performance for preprocessing.jl.
Edit: This could also serve as a solution for #74 #76