Open jakobnissen opened 11 months ago
A more descriptive function name than encoding
would be better IMO, maybe string_encoding
?
The return type should be a singleton type, not a Symbol
. So, e.g., struct StringEncodingUTF8 end
.
Perhaps we could also have another trait that would say whether the encoding is known-valid (if no, it may need further validation/parsing).
Bikeshedding a bit - If String
can be "UTF-8" without being valid UTF-8, then encoding
is a little strong as a name. It seems like the idea is "ostensible/purported/assumed encoding", not actual encoding.
I found some issues probably related:
And a discourse discussion from one of them:
https://discourse.julialang.org/t/what-is-the-interface-of-abstractstring/8937
I do a lot with high-performance programming with strings. When you do that, it's often more efficient to work on the underlying bytes. Luckily, Julia enables that with
codeunits
.However, there is no way of knowing generically, given some
AbstractString
, what encoding it uses - i.e, what the result ofcodeunits
means. That makes it difficult (impossible?) to write code usingcodeunits
forAbstractString
. Most implementations ofAbstractString
uses UTF8, such asString
,SubString{String}
,StringView
(of StringViews.jl), the various types in InlineStrings.jl, and more. But this is not generally true.I propose to include a trait function
encoding(::Type{<:AbstractString})::Symbol
. In Base, the default implementations should be: