Closed alexzandros closed 8 months ago
Can you paste the stack trace you saw? Looks like a bug on our side.
I also experienced the same issue. The text in question contains Less likely working with code I don’t like
and the stacktrace is
ERROR: LoadError: StringIndexError: invalid index [38], valid nearby indices [36]=>'’', [39]=>'t'
Stacktrace:
[1] string_index_err(s::String, i::Int64)
@ Base ./strings/string.jl:12
[2] SubString{String}(s::String, i::Int64, j::Int64)
@ Base ./strings/substring.jl:32
[3] SubString
@ ./strings/substring.jl:38 [inlined]
[4] SubString
@ ./strings/substring.jl:44 [inlined]
[5] remove_patterns(s::SubString{String}, rex::Regex)
@ TextAnalysis ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:486
[6] remove_patterns!
@ ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:508 [inlined]
[7] remove_patterns!(crps::Corpus{StringDocument{SubString{String}}}, rex::Regex)
@ TextAnalysis ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:534
[8] prepare!(crps::Corpus{StringDocument{SubString{String}}}, flags::UInt32; skip_patterns::Set{AbstractString}, skip_words::Set{AbstractString})
@ TextAnalysis ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:415
[9] prepare!
@ ~/.julia/packages/TextAnalysis/B0QxG/src/preprocessing.jl:406 [inlined]
[10] summarize(d::StringDocument{String}; ns::Int64)
@ TextAnalysis ~/.julia/packages/TextAnalysis/B0QxG/src/summarizer.jl:22
[11] main()...
Not reproducible with Julia 1.9 and TextAnalysis 0.8
I'm trying to create a StringDocument based on a string that contains utf-8 characters, and all i'm getting is a
StringIndexError
My code is as follows
And I get the following error
Followed by a stack trace.
So, I need to know what is the best practice for working with utf strings.
Thanks in advance.