JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
373 stars 95 forks source link

Unexpected behaviour of ngram(sd, 3) #202

Closed aquatiko closed 4 years ago

aquatiko commented 4 years ago

I was expecting a Dict of trigrams upon calling this as mentioned in the docs. But I'm getting unigrams, bigrams and trigrams.

julia> sd = sd = StringDocument("To be or not to be...")
julia> ngrams(sd, 3)
Dict{AbstractString,Int64} with 14 entries:
  "to be"     => 1
  "not"       => 1
  "or not to" => 1
  "be or"     => 1
  "not to be" => 1
  "or"        => 1
  "not to"    => 1
  "To"        => 1
  "be or not" => 1
  "be"        => 2
  "To be"     => 1
  "or not"    => 1
  "to"        => 1
  "To be or"  => 1

Fixing this can help solve #201 next.

Ayushk4 commented 4 years ago

Are you working off of master branch?

aquatiko commented 4 years ago

Yes. I'm using the latest release too. Is that frozen for a dev branch?

aquatiko commented 4 years ago

It seems that I was not synced with the latest version of TextAnalysis. I recently did a clean build of julia to 1.4.0 and used

pkg> add TextAnalysis

but this dosen't seem to install the version we see on the repo. (maybe the registry needs to be updated). But after cloning it from source,

pkg> add https://github.com/JuliaText/TextAnalysis.jl.git

this error seems to be gone. So do #201 and #193

aquatiko commented 4 years ago

I'm closing this. #201 and #193 can be closed too. But this needs to be looked at for this behavior.

pkg> add TextAnalysis

Thanks for the tip @Ayushk4 :)

Ayushk4 commented 4 years ago

Yes, we need to tag a new release for the package. Its been over a year since the last release. The release is waiting due to #180 . I will try to get that done in a week. Thanks.