Closed segeljakt closed 3 years ago
What do you think, is this a good strategy?
ASCII-only does not cut it in the 21:st century :) A linked list representation is horribly memory inefficient and Ropes has too much overhead for the small strings we can expect in a program (format strings, status messages etc). So I'd say that std::string::string
is the right compromise.
I agree we should go for std::string::String
. Should it be ephemeral (mutable cell) or persistent (clone on write)? Clone on write is a bit brutal. When you just want to append one character you have to clone the entire string.
How will it look at the arc-script level? As arc-script is SSA, CoW will probably create the least number of surprises, but it is, as you say, not very efficient when doing incremental changes. We could use CoW for now, when/if it is an actual performance problem, move it to a native arc-dialect type and implement an optimization which converts operations into destructive updates where it can be done safely.
At the arc-script level it would look like:
val x = "hello worl";
val y = x.append('d');
Eventually when the type system is more advanced we could create syntactic sugar
val x = "hello worl";
val y = x + 'd';
This commit adds support for strings. Strings are on the surface represented as abstract data types and implemented using Rust's
std::string::String
. What do you think, is this a good strategy? From what I see there are four possible representations:std::string::string
)std::vec::Vec<u8>
)std::collections::LinkedList<char>
)Both lists and ropes are persistent (versioned). However lists are horrible without lazy evaluation. In my implementation I wrapped an ephemeral UTF8
std::string::String
inside a reference counter for immutable sharing. If two strings are used in an operation like concatenation, then a new string is allocated for the result instead of mutating one of the strings. Ropes are smarter by not having to reallocate but probably come with some overhead.