Upon trying to remove sparse terms from a corpus via
remove_sparse_terms!(corp, .05)
I run into the following error message:
PCRE compilation error: regular expression is too large at offset 592769
error(::String)@error.jl:33
compile(::String, ::UInt32)@pcre.jl:128
compile(::Regex)@regex.jl:79
Regex(::String, ::UInt32, ::UInt32)@regex.jl:44
Regex@regex.jl:67[inlined]
mk_regex(::String)@preprocessing.jl:31
_combine_regex(::Set{AbstractString})@preprocessing.jl:547
_build_regex(::Languages.English, ::UInt32, ::Set{AbstractString}, ::Set{AbstractString})@preprocessing.jl:542
var"#prepare!#14"(::Set{AbstractString}, ::Set{AbstractString}, ::typeof(TextAnalysis.prepare!), ::TextAnalysis.Corpus{TextAnalysis.StringDocument{String}}, ::UInt32)@preprocessing.jl:414
remove_words!@preprocessing.jl:227[inlined]
remove_sparse_terms!(::TextAnalysis.Corpus{TextAnalysis.StringDocument{String}}, ::Float64)@preprocessing.jl:341
top-level scope@Local: 18
Is this a bug or might this just mean there is something wrong with one of the documents? That might be a possibility as I'm dealing with patents which can get pretty messy.
Upon trying to remove sparse terms from a corpus via
I run into the following error message:
Is this a bug or might this just mean there is something wrong with one of the documents? That might be a possibility as I'm dealing with patents which can get pretty messy.
I'm on Julia 1.6.1 and TextAnalysis v0.7.3.