Open aplavin opened 1 month ago
In your example s1
and s2
are not equal because the first character of s1
has an annotation. This lines up with "Test whether two strings are equal character by character", but not the "(technically... )" bit.
Initially, I didn't have annotations matter when comparing an AnnotatedString
with an unannotated string, but that makes string equality non-transitive, which is a big no-no.
Not sure how annotations affect code points (docs: "Unicode code point by code point"). Also, the straightforward interpretation of "Strings are compared as sequences of characters, ignoring encoding" (from the same excerpt) seems to imply comparing, well, characters themselves :) No matter how you encode it (with or without annotations) it still is the same character, right?
The AbstractChar type is the supertype of all character implementations in
Julia. A character represents a Unicode code point, and can be converted to
an integer via the codepoint function in order to obtain the numerical value
of the code point, or constructed from the same integer. These numerical
values determine how characters are compared with < and ==, for example. New
T <: AbstractChar types should define a codepoint(::T) method and a
T(::UInt32) constructor, at minimum
that makes string equality non-transitive, which is a big no-no
Sure! I expected ==
to ignore annotations, following all these explanations in the docs.
Not sure how annotations affect code points
They don't, but the reasoning I put forwards is predicated on a character with annotations not being equal to a character without annotations.
Sure! I expected == to ignore annotations, following all these explanations in the docs.
We could do this, but I'd expect this behavior to be potentially surprising.
I think the situation can be summed up like so:
A string without annotations is equal to an unannotated string.
Annotated strings are equal when the underlying string is equal, as are annotations.
Loosen the "technically" to something like "often" in the ==(::AbstractString, ::AbstractString)
docstring.
Ignore annotations entirely when comparing AnnotatedString
s and other strings with ==
.
Just document
==(a::AnnotatedString, b::AnnotatedString) -> Bool
Test whether their strings and annotations are equal.
Ah yep, though maybe that would need to explicitly mention
==(a::AnnotatedString, b::AnnotatedString)
==(a::AbstractString, b::AnnotatedString)
==(a::AnnotatedString, b::AbstractString)
Yeah the current situtation is definitely weird:
julia> a = Base.AnnotatedString("abc", [(2:3, :x => 123)])
"abc"
julia> a[1] == 'a' && a[2] == 'b' && a[3] == 'c'
true
julia> a == "abc"
false
Ahh, I think that the AnnotatedChar
equality should be updated to match.
Don't make AnnotatedString
a subtype of AbstractString
This way AnnotatedString
is free to define whatever equality methods it wants without being constrained by the existing AbstractString
interface.
Half the point of AnnotatedString
is that it can be passed to functions that are written for an AbstractString
and don't know what to do with annotations and it "just works", automatically falling back to a plain un-annotated string :upside_down_face:
The issue is that as a Julia user, I have no idea which functions that use AbstractString
are going to suddenly have different behavior when passed an AnnotatedString
because of some strange overload or piracy. If you want a function to have fundamentally different behavior for AnnotatedString
why not just give the function a new name, for example, instead of ==
define isequal_annotated(a::AbstractString, b::AbstractString) = a == b && annotations(a) == annotations(b)
And define a fall back: annotations(::AbstractString) = Tuple{UnitRange{Int64}, Pair{Symbol, Any}}[]
Mmmm, I think there's an argument to be made for abandoning the inclusion of annotations in the concept of string equality, and require them to be checked separately if you do care.
I'm not sure if this is the path we should take, but that is the alternative to the current situation.
Docs:
Actual behavior:
Either the docs or the implementation are clearly wrong...