some cleanup, inclusing changing TokenBuffer to use replaces rather than splits

JuliaText / WordTokenizers.jl

High performance tokenizers for natural language processing and other related tasks

Other

96 stars 25 forks source link

Closed oxinabox closed 5 years ago

oxinabox commented 5 years ago

This change we were talking about in #10 and I think on making it it is cleaner, easier to write.

I also think this kind of rule will be useful for #18 where some of the things you flush are normalized versions of the "raw tokens".

What do you think @MikeInnes ?