JuliaText / WordTokenizers.jl

High performance tokenizers for natural language processing and other related tasks
Other
96 stars 25 forks source link

Fix TokTok.jl #29

Closed Ayushk4 closed 5 years ago

Ayushk4 commented 5 years ago

Refer #28

codecov-io commented 5 years ago

Codecov Report

Merging #29 into master will increase coverage by 18.39%. The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #29       +/-   ##
===========================================
+ Coverage   75.88%   94.27%   +18.39%     
===========================================
  Files          10       10               
  Lines         651      524      -127     
===========================================
  Hits          494      494               
+ Misses        157       30      -127
Impacted Files Coverage Δ
src/words/TokTok.jl 100% <100%> (+19.48%) :arrow_up:
src/sentences/sentence_splitting.jl 98.11% <0%> (+9.97%) :arrow_up:
src/words/nltk_word.jl 100% <0%> (+10%) :arrow_up:
src/words/fast.jl 98.14% <0%> (+16.32%) :arrow_up:
src/words/tweet_tokenizer.jl 91.15% <0%> (+19.11%) :arrow_up:
src/words/sedbased.jl 100% <0%> (+84.21%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f910c2c...00cdf78. Read the comment docs.

Ayushk4 commented 5 years ago

@oxinabox I believe this is done from my side, pls review.

Ayushk4 commented 5 years ago

Here is one small difference b/w 1.1 and 1.0 that caused error in one of the newly added test cases. (Now fixed)

1.1

julia> collect.(Iterators.flatten(["–—", ("'", "’")]))
4-element Array{Array{Char,N} where N,1}:
 '–'   
 '—'   
 ['\'']
 ['’'] 

1.0

julia> collect.(Iterators.flatten(["–—", ("'", "’")]))
4-element Array{Array{Char,1},1}:
 ['–'] 
 ['—'] 
 ['\'']
 ['’'] 
oxinabox commented 5 years ago

LGTM, thanks