JuliaText / WordTokenizers.jl

High performance tokenizers for natural language processing and other related tasks
Other
96 stars 25 forks source link

Reexport RevTok.jl #1

Closed oxinabox closed 6 years ago

oxinabox commented 6 years ago

From Slack:

jekbradbury [9:59 AM]

i still think the approach in github.com/jekbradbury/Revtok.jl makes sense for deep learning—aggressively ignore edge cases, and do the dumbest and simplest thing possible such that you can still reverse it perfectly in detokenization ...

Lyndon White [10:25 AM]

@jekbradbury do you think it would be a good idea for me to re-export Revtok.jl from WordTokenizers.jl? Hmmm, what I could do is a Plots.jl-like approach. where tokenize and split_sentences redispatch to a selectable default tokenizer. And if you set revtok as your tokenizer it checks to see if the package is available.

jekbradbury [10:31 AM]

Sure

jekbradbury [10:31 AM]

it's a very small package, but it does depend on DataStructures which is quite large

oxinabox commented 6 years ago

Making it configurable is done. The next part requires Revtok to be registered https://github.com/jekbradbury/Revtok.jl/issues/1

oxinabox commented 6 years ago

Actually, this doesn't even need any code added to the package. See my latest documentation addition: https://github.com/JuliaText/WordTokenizers.jl#example-setting-tokenizer--revtokjl

It would basically be the same as it would be if I had used Require.jl to do an optional dependency. So I don't need to.