JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
374 stars 96 forks source link

remove_corrupt_utf8 seems to want code from an old version of Julia #62

Closed mariosangiorgio closed 6 years ago

mariosangiorgio commented 6 years ago

The first time I executed it I got the following error:

julia> TextAnalysis.prepare!(corpus, prepare_flags)
ERROR: MethodError: no method matching zero(::Type{Char})
Closest candidates are:
  zero(::Type{Base.LibGit2.GitHash}) at libgit2/oid.jl:106
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuildItem}) at pkg/resolve/versionweight.jl:82
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuild}) at pkg/resolve/versionweight.jl:124
  ...
Stacktrace:
 [1] remove_corrupt_utf8(::String) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:46
 [2] remove_corrupt_utf8!(::TextAnalysis.StringDocument) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:58
 [3] remove_corrupt_utf8!(::TextAnalysis.Corpus) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:83
 [4] #prepare!#10(::Set{AbstractString}, ::Set{AbstractString}, ::Function, ::TextAnalysis.Corpus, ::UInt32) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:271
 [5] prepare!(::TextAnalysis.Corpus, ::UInt32) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:268

I'm using Julia 0.6 and I don't have any definition of zero(:Type{Char}). I'm not sure if Char used to be a Number and it got zero from there or if I'm missing to import something.

In any case, if I define it

julia> import Base.zero

julia> zero(Char) = ' '
zero (generic function with 17 methods)

I then get

julia> TextAnalysis.prepare!(corpus, prepare_flags)
ERROR: UndefVarError: CharString not defined
Stacktrace:
 [1] remove_corrupt_utf8(::String) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:52
 [2] remove_corrupt_utf8!(::TextAnalysis.StringDocument) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:58
 [3] remove_corrupt_utf8!(::TextAnalysis.Corpus) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:83
 [4] #prepare!#10(::Set{AbstractString}, ::Set{AbstractString}, ::Function, ::TextAnalysis.Corpus, ::UInt32) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:271
 [5] prepare!(::TextAnalysis.Corpus, ::UInt32) at /Users/mariosangiorgio/.julia/v0.6/TextAnalysis/src/preprocessing.jl:268

and this suggest that it has been deprecated since Julia 0.3

Am I doing something wrong or is remove_corrupt_utf8 broken on recent versions of Julia?

aviks commented 6 years ago

Looks like a bug, will fix soon.

aviks commented 6 years ago

Fix in PR #64