Closed oxinabox closed 6 years ago
Making it configurable is done. The next part requires Revtok to be registered https://github.com/jekbradbury/Revtok.jl/issues/1
Actually, this doesn't even need any code added to the package. See my latest documentation addition: https://github.com/JuliaText/WordTokenizers.jl#example-setting-tokenizer--revtokjl
It would basically be the same as it would be if I had used Require.jl
to do an optional dependency.
So I don't need to.
From Slack:
jekbradbury [9:59 AM]
i still think the approach in github.com/jekbradbury/Revtok.jl makes sense for deep learning—aggressively ignore edge cases, and do the dumbest and simplest thing possible such that you can still reverse it perfectly in detokenization ...
Lyndon White [10:25 AM]
@jekbradbury do you think it would be a good idea for me to re-export Revtok.jl from WordTokenizers.jl? Hmmm, what I could do is a Plots.jl-like approach. where
tokenize
andsplit_sentences
redispatch to a selectable default tokenizer. And if you setrevtok
as your tokenizer it checks to see if the package is available.jekbradbury [10:31 AM]
Sure
jekbradbury [10:31 AM]
it's a very small package, but it does depend on DataStructures which is quite large