cessen / ropey

A utf8 text rope for manipulating and editing large texts.
MIT License
1.04k stars 46 forks source link

Share immutable chunks externally #47

Closed calmofthestorm closed 2 years ago

calmofthestorm commented 2 years ago

Is it possible to get a copy-on-write reference to a chunk? Or alternatively, a rope that guarantees it only ever consists of a single chunk?

The context here is a struct which needs a memory contiguous string for some operations, but to produce a rope containing that string for other operations. Currently, I am storing both a Rope and a String that have the same content, but this isn't ideal.

cessen commented 2 years ago

The context here is a struct which needs a memory contiguous string for some operations, but to produce a rope containing that string for other operations.

I'm not 100% sure I understand your use case here. Do you mean you want something that's (essentially) actually a String but that you can pass to functions as if it were a Rope (presumably just for read-only operations)?

calmofthestorm commented 2 years ago

That's essentially it. As a (simplified) analogy, if I consider Rope a list of Arc<String>, I'd like to be able to write code that shares the Arc<String> with the Rope. This means I can pass &str to a function which requires contiguous storage, but can also append that same string to a Rope, without needing double the memory.

At present, I either double store or have a function that checks if myrope.chunks().count() is >1, in which case it calls to_string, but if it is == 1, I can instead just pass the single chunk to a function that takes &str. This works well in practice because Rope tends to keep them in a single chunk in this size range.

Having looked through the design more, however, I think I can understand why this might not work that well. AIUI, if this were implemented, the Rope might need to change the node in subsequent operations, causing a copy-on-clone, so now it and my code each have their own copy, with an end result of double storage anyway in many cases. Workarounds are presumably possible, but would introduce other complexity. Does that sound right?

cessen commented 2 years ago

Ah, I see. So you don't want a String that can behave as a Rope, rather you want a shared reference into a contiguous part of a Rope but that doesn't prevent the Rope itself from being mutated. Is that right?

(Sorry for the back-and-forth. I just want to make sure I understand what you're aiming for before I try to suggest things, ha ha.)

calmofthestorm commented 2 years ago

Yes, that is correct.

cessen commented 2 years ago

Got it. In that case, I think what you're really looking for is some kind of Rope clone. The trivial approach would be to just clone the whole Rope, but then you'll get a lot of duplicated chunks anyway as the original Rope gets modified. Ideally what you want is a way to tell Ropey "make a Rope clone out of just this chunk". At the moment there isn't a way to do this, but implementing it would be pretty straightforward.

Having said that, I'd rather not grow Ropey's API surface any further if I can reasonably avoid it, particularly for something that (at least at first glance) seems kind of niche. So what kind of memory savings are we talking about here? Are you making tens of thousands of these (currently cloned) strings?

Edit: and how large are these strings typically? Are they usually about as large as a chunk (on the order of 500-1000 bytes)?

calmofthestorm commented 2 years ago

Looking at the distribution of sizes, there are high 1,000s - low 10,000s of strings in question, but essentially all are under 500 bytes. As such, it seems easy enough for me to just write a wrapper around the rope that checks the chunk count and saves just the rope if it is 1, or both the rope and a string if it is more. This is essentially a type that does the same thing as several of my existing processing functions that special case the 1-chunk rope. Duplicating the occasional odd long string seems fine.

cessen commented 2 years ago

Yeah, that sounds like a reasonable approach. Sorry I couldn't be of any more help. If you run into anything else, please don't hesitate to file another issue!

calmofthestorm commented 2 years ago

It seems to work so far, so I'm making progress and I'm happy. Thanks for taking the time to understand my use case thoroughly and help identify the best way forward. Also, FWIW, I appreciate avoiding scope creep in APIs, and doing that means making hard calls on what to include.