JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
373 stars 95 forks source link

Unable to convert corpus to DataFrame #236

Closed tk3369 closed 3 years ago

tk3369 commented 3 years ago
julia> crps = Corpus([StringDocument("hello world")])
A Corpus with 1 documents:
 * 1 StringDocument's
 * 0 FileDocument's
 * 0 TokenDocument's
 * 0 NGramDocument's

Corpus's lexicon contains 0 tokens
Corpus's index contains 0 tokens

julia> convert(DataFrame, crps)
ERROR: MethodError: no method matching Array{Union{Missing, String},N} where N(::Int64)
Closest candidates are:
  Array{Union{Missing, String},N} where N(::UndefInitializer, ::Int64) where T at boot.jl:420
  Array{Union{Missing, String},N} where N(::UndefInitializer, ::Int64, ::Int64) where T at boot.jl:421
  Array{Union{Missing, String},N} where N(::UndefInitializer, ::Int64, ::Int64, ::Int64) where T at boot.jl:422
  ...
Stacktrace:
 [1] convert(::Type{DataFrame}, ::Corpus{StringDocument{String}}) at /Users/tomkwong/.julia/packages/TextAnalysis/FS0XI/src/corpus.jl:93
 [2] top-level scope at REPL[4]:1

Maybe it's old code that got broken from Julia 1.0? https://github.com/JuliaText/TextAnalysis.jl/blob/1cb2fd71cc38988db19be7b2d8f7543d4b4bdcb8/src/corpus.jl#L93-L98

I believe it should be

    df[:Language] = Array{Union{String,Missing}}(undef, n)
    df[:Title] = Array{Union{String,Missing}}(undef, n)
    df[:Author] = Array{Union{String,Missing}}(undef, n)
    df[:TimeStamp] = Array{Union{String,Missing}}(undef, n)
    df[:Length] = Array{Union{Int,Missing}}(undef, n)
    df[:Text] = Array{Union{String,Missing}}(undef, n)