Open Cat-sushi opened 3 years ago
This proposal is derived from the closed proposal #1428.
g"𠮷野".length
returns 2 (grapheme clusters), but not 3 (code units).
I think it should be constant, but I'm not sure it is a good idea. So, I changed the title.
Naming system of prefix must be arranged with #886 and others if exist.
I don't know that the g"str"
syntax is necessarily in line with dart style conventions to this point, though there is precedent in Rust's byte literal syntax b"str"
. I might prefer to simply be able to access "words".characters
or "words".clusters
; that's pretty much how it's handled now with codeunits and runes.
I agree that the characters package should be included as a core package; it provides a fundamental functionality, and it's a lot easier to import "dart:characters" than go to pubspec.yaml, include characters, come back to my file, import the package, and remember why I needed it in the first place.
I might prefer to simply be able to access "words".characters or "words".clusters; that's pretty much how it's handled now with codeunits and runes
There is a proposal to introduce single code point constant (but not sequence of code points) with similar syntax by core member.
Refer #886, in which the necessity of literal is mentioned.
"words".characters
already exists, which returns a Iterable
view of String
.
Sequence of code units is a default representation of String
and String
natively provides code unit based API.
On the other hand, String.codeUnits
generate List<int>
in which every single code unit(16 bits) are represented int
(64 bits), which have quite different purpose from that of Characters
.
As you said, grapheme cluster is fundamental, which deserves literal, I think.
Characters cs = '𠮷野'; // lint : omit_local_variable_types
can be rewrote to
var cs = g'𠮷野';
If we move Characters
into the platform libraries, then adding a literal for creating (effectively) const Characters(stringLiteral)
seems reasonable.
I'm also sure that some will argue that Characters
should be the default string literal, and you'd have to write u16"...."
to get the current string. (Then u8"...."
could be UTF-8 encoded).
That's a tough sell, though.
@lrhn
I'm also sure that some will argue that
Characters
should be the default string literal, and you'd have to writeu16"...."
to get the current string. (Thenu8"...."
could be UTF-8 encoded). That's a tough sell, though.
I knew. I don't request that far.
It might be nice to have a lint discouraging people from using String.length
too. It's almost never what they really want.
I can assure you, as someone who's written quite a lot of small parsers, that String.length
is exactly what I want when I traverse the code units of a string. Parsing JSON, or integer literals, or URLs, or XML, or any other structured textual input which is commonly stored as a String
, is quite different from handling user-written text. The Dart String
class contains both. The API just happens to be better suited for the former.
A Dart String
is a sequence of code units. Any abstraction on top of that is a separate class (Runes
, Characters
). You can, an should, choose the abstraction you need, but sometimes "sequence of code units" is the abstraction level you need.
A String
is not only for text - words and phrases intended to be displayed as such. It supports that as well.
@dnfield @lrhn
String.length
is exactly what I want when I traverse the code units of a string
Yes.
The problem is that, String.length
is too exposed to average programmers.
So, deprecation of String.length
and introduction String.size
might be a solution.
But, that was a discussion at #1428.
This proposal is just for literal and dart:core
.
Currently, grapheme clusters (
Characters
) are the only way to manipulate natural languages correctly. So, I propose syntax for grapheme clusters literals likeg"𠮷野"
. It might include a proposal thatcharacters
extension must be a part ofdart:core
.