JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
373 stars 95 forks source link

TokenBuffer for preprocessing Documents #143

Open Ayushk4 opened 5 years ago

Ayushk4 commented 5 years ago

We have been using a fast TokenBuffer API to speed up for various tokenizers in WordTokenizers.jl.

Referring to #141 #140, I think it might be beneficial to extend the TokenBuffer API for Documents and Corpus that TextAnalysis.jl offers (excluding NGramDocument and TokenDocument). This can then be used to improve the performance for preprocessing.jl.

Edit: This could also serve as a solution for #74 #76