JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.13k stars 5.43k forks source link

`BoundsError` when joining `AnnotatedStrings` with distinct label orderings #54860

Closed caleb-allen closed 5 days ago

caleb-allen commented 1 month ago

I think I've encountered a bug which occurs when joining AnnotatedStrings with annotations that have not been constructed in the same manner as StyledStrings, specifically with inconsistent ordering of annotation labels between strings.

With simple annotations on two AnnotatedString instances, join works as expected:

julia> import Base: AnnotatedString, annotatedstring, annotations, annotate!

julia> a = AnnotatedString("the quick fox ", [(1:14, :FOO => "bar")])
"the quick fox "

julia> b = AnnotatedString("jumped over the lazy dog", [(1:24, :FOO => "bar")])
"jumped over the lazy dog"

julia> annotations(a * b) # concat only, without joining the annotations
2-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (1:14, :FOO => "bar")
 (15:38, :FOO => "bar")

julia> annotations(join([a, b]))
1-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (1:38, :FOO => "bar")

However, if we attempt to join the above string a with an annotated string whose labels are inserted in a different order, it results in a BoundsError:

julia> c = AnnotatedString("jumped over the lazy dog", [(1:5, :BAZ => "bar"), (1:24, :FOO => "bar")])
"jumped over the lazy dog"

julia> annotations(a * c)
3-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (1:14, :FOO => "bar")
 (15:19, :BAZ => "bar")
 (15:38, :FOO => "bar")

julia> annotations(join([a, c]))
ERROR: BoundsError: attempt to access 1-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}} at index [0]
Stacktrace:
  [1] throw_boundserror(A::Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}, I::Tuple{Int64})
    @ Base ./essentials.jl:14
  [2] getindex
    @ ./essentials.jl:892 [inlined]
  [3] _insert_annotations!(io::Base.AnnotatedIOBuffer, annotations::Vector{Tuple{UnitRange{…}, Pair{…}}}, offset::Int64)
    @ Base ./strings/annotated.jl:600
  [4] _insert_annotations!
    @ ./strings/annotated.jl:591 [inlined]
  [5] write
    @ ./strings/annotated.jl:499 [inlined]
  [6] print
    @ ~/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/share/julia/stdlib/v1.11/StyledStrings/src/io.jl:255 [inlined]
  [7] join(io::Base.AnnotatedIOBuffer, iterator::Vector{AnnotatedString{String}}, delim::String)
    @ Base ./strings/io.jl:352
  [8] join
    @ ./strings/io.jl:349 [inlined]
  [9] _join_preserve_annotations(::Vector{AnnotatedString{String}})
    @ Base ./strings/io.jl:359
 [10] join(iterator::Vector{AnnotatedString{String}})
    @ Base ./strings/io.jl:366
 [11] top-level scope
    @ REPL[30]:1
Some type information was truncated. Use `show(err)` to see complete types.

It appears that the BoundsError does not occur if the "joined" annotation is ordered first on both strings (:FOO first for each)

julia> d = AnnotatedString("jumped over the lazy dog", [(1:24, :FOO => "bar"), (1:5, :BAZ => "bar")])
"jumped over the lazy dog"

julia> join([a, d])
"the quick fox jumped over the lazy dog"

julia> join([a, d]) |> annotations
2-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (1:38, :FOO => "bar")
 (15:19, :BAZ => "bar")

This may be related to #54561 as the stacktrace shows join being dispatched to StyledStrings.

This bug is present on Julia 1.11.0-beta2, installed via juliaup

julia> versioninfo()
Julia Version 1.11.0-beta2
Commit edb3c92d6a6 (2024-05-29 09:37 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = auto
  JULIA_PKG_USE_CLI_GIT = true
  JULIA_TEST_FAILFAST = true
  JULIA_PKG_PRESERVE_TIERED_INSTALLED = true
tecosaur commented 1 month ago

Thanks for the detailed bug report! I'll have a look at this on the weekend.