cda-group / arc

Programming Language for Continuous Deep Analytics
https://cda-group.github.io/arc/
44 stars 6 forks source link

Add support for strings #304

Closed segeljakt closed 3 years ago

segeljakt commented 3 years ago

This commit adds support for strings. Strings are on the surface represented as abstract data types and implemented using Rust's std::string::String. What do you think, is this a good strategy? From what I see there are four possible representations:

Both lists and ropes are persistent (versioned). However lists are horrible without lazy evaluation. In my implementation I wrapped an ephemeral UTF8 std::string::String inside a reference counter for immutable sharing. If two strings are used in an operation like concatenation, then a new string is allocated for the result instead of mutating one of the strings. Ropes are smarter by not having to reallocate but probably come with some overhead.

frej commented 3 years ago

What do you think, is this a good strategy?

ASCII-only does not cut it in the 21:st century :) A linked list representation is horribly memory inefficient and Ropes has too much overhead for the small strings we can expect in a program (format strings, status messages etc). So I'd say that std::string::string is the right compromise.

segeljakt commented 3 years ago

I agree we should go for std::string::String. Should it be ephemeral (mutable cell) or persistent (clone on write)? Clone on write is a bit brutal. When you just want to append one character you have to clone the entire string.

frej commented 3 years ago

How will it look at the arc-script level? As arc-script is SSA, CoW will probably create the least number of surprises, but it is, as you say, not very efficient when doing incremental changes. We could use CoW for now, when/if it is an actual performance problem, move it to a native arc-dialect type and implement an optimization which converts operations into destructive updates where it can be done safely.

segeljakt commented 3 years ago

At the arc-script level it would look like:

val x = "hello worl";
val y = x.append('d');

Eventually when the type system is more advanced we could create syntactic sugar

val x = "hello worl";
val y = x + 'd';