carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.26k stars 1.47k forks source link

Ownership and string representation #2065

Open OlaFosheimGrostad opened 2 years ago

OlaFosheimGrostad commented 2 years ago

The design doc currently states "The right model of a string view versus an owning string is still very much unsettled."

Other issues, such as string-interpolation will need some clarity on what kind of string representation and ownership Carbon will support.

Maybe it would be a good idea to map out this landscape and see if there is some kind of unifying scheme or shared protocol that can be used to bring it all together in a flexible and efficient manner?

There seems to be many facets to this design issue:

L4stR1t3s commented 2 years ago

Would it be desirable to have a common generic ownership type that can be used for strings, pointers and file-descriptors?

If possible, without negatively affecting speed and/or memory usage, yes.

Should a string-owner also support a fixed size short-string optimization?

Sounds like an implementation detail to me, not something that the spec should define.

Should Carbon support a rope-representation for large mutable strings https://en.wikipedia.org/wiki/Rope_(data_structure)

This is a tree of strings. IMO a standard library should offer implementations of core concepts, not abstractions of those concepts. So that means a tree and a string in this case. Developers can implement a rope from those, and design it to fit their specific needs.

Should there be a difference between read only representations, mutable representations, appendable representations?

Only if a seperate implementation offers a noticeable difference in speed and/or memory usage IMO. Otherwise they are just abstractions of a core concept.

What should the relationship between Carbon and C++ string, string_view and span be?

I would expect that a Carbon string/string_view/span/... can be created with a simple 1-on-1 copy of the data of an std::string/std::string_view/std::span and vice versa. If any data conversion is necessary, I would consider the overhead of that unacceptable.

Do we need to consider C++ third party library string types?

No, if they don't already offer conversion functionality to STL strings, it's usually not hard to add. If there is a need for it, I am sure people will write libraries for it. Carbon should focus on interoperability between C++ language and the STL. Anything else can and should be derived from that.

OlaFosheimGrostad commented 2 years ago

Thank you for the feedback. I think I should rewrite the issue so that the protocol question comes first on the top. I guess the question could be rephrased to something like "can we device a protocol/scheme that can provide a performant API to many different string representations?"

L4stR1t3s commented 2 years ago

I would also like to see the possiblity of a string (and other data structures) sharing the same memory in C++ and Carbon. I don't think that will be possible by using the regular Carbon and C++ STL data structures, because their internals might change, and can be platform/architecture/... dependent. But maybe a separate combined STL specifically designed for this could be possible? It would be useful to avoid duplication of large chunks of data being passed from Carbon to C++ and vice versa.

OlaFosheimGrostad commented 2 years ago

My personal opinion is that Carbon could maintain a patch set for Clang and the Clang standard library. That could allow Carbon to do interesting optimizations with datastructures originating in C++ land (ARC of shared_ptr, string optimizations etc). I don't expect that to happen, but getting good performance with less work could be a good reason to switch from C++ to Carbon IMO.

github-actions[bot] commented 1 year ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive label. The long term label can also be added for issues which are expected to take time. This issue is labeled inactive because the last activity was over 90 days ago.