cessen / ropey

A utf8 text rope for manipulating and editing large texts.
MIT License
1.04k stars 46 forks source link

Build rope from slices #95

Closed truchi closed 3 months ago

truchi commented 3 months ago

Hello!

I am looking for a way to concatenate rope slices, i.e. impl FromIterator<RopeSlice> for Rope. Concatenation is the big strength of ropes and you probably have it internally but I don't see how to do this with the public API (maybe Builder but I need structural sharing).

My use case: editor's copy/paste/edit when selection is "big". Am I approaching this right? How much bytes/chunks would "big" be to see benefits over inserting in a loop?

Thank you ❤️ 🪢

truchi commented 3 months ago

Oh wait, there's Rope::split_off() and Rope::append()!

truchi commented 3 months ago

(While I'm at it, are you considering WeakRopes for v2?)

cessen commented 3 months ago

I'm not sure what you mean by WeakRope. Can you elaborate?

My use case: editor's copy/paste/edit when selection is "big". Am I approaching this right? How much bytes/chunks would "big" be to see benefits over inserting in a loop?

I did some quick tests on my machine. Keep in mind that these are just to give a rough idea, because all the tests started with an empty rope. But I think the numbers should be in the right ball park. All tests progressively insert the insertion text in a loop, 500 bytes at a time.

So I'd say as long as you aren't expecting your users to regularly paste more than 100 MB of text, it probably doesn't matter much.

Also, my personal opinion on this is that since copy/paste can also happen from e.g. the system clipboard, where you can't optimize with split/append anyway, it's better to focus on making sure the UI remains responsive during long operations if you're worried about copy/pasting text that large. You can't guarantee where that text data is coming from, and ideally the user experience should be good regardless.

truchi commented 3 months ago

Hello,

I'm not sure what you mean by WeakRope. Can you elaborate?

pub struct WeakRope {
    pub(crate) root: std::sync::Weak<Node>,
}

So I can:

/// A `(index , line, column, width)` cursor.
///
/// Values for `line`, `column` and `width` are cached
/// and recomputed only if [`Rope::is_instance()`] is `false`.
#[derive(Default)]
pub struct CachedCursor {
    index: usize,
    line: Cell<Option<usize>>,
    column: Cell<Option<usize>>,
    width: Cell<Option<usize>>,
    rope: Cell<WeakRope>,
}

(I'm probably over engineering this) (see)

I did some quick tests on my machine.

Not sure I understand. rope.internal_insert(random_index, some_str)? I use 6 * MAX_BYTES for "big" because of this comment, but anyway you are right it shouldn't matter that much for usual cases.

Thanks for your answer!

cessen commented 3 months ago
pub struct WeakRope {
   pub(crate) root: std::sync::Weak<Node>,
}

Ah, got it. I have no plans to add support for anything like that to Ropey, no. To be honest, I kind of regret adding the is_instance() method as well, and I likely won't be reproducing it in Ropey 2 unless someone can make a very compelling argument for it. Rope clones in Ropey are supposed to have full copy semantics: semantically they have nothing to do with each other from the moment of creation. They just happen to be abnormally efficient in terms of time and space.

Instance/reference semantics can still be built on top of Ropey, the same way you would build them on top of any other type. For example, by simply wrapping them in an Arc yourself.

cessen commented 3 months ago

Not sure I understand. rope.internal_insert(random_index, some_str)?

Ah, no, I mean just calling rope.insert() in a loop, with 500 bytes of text at a time. You were asking about optimizing text copy/paste by tree splicing (split_off() and append()) vs just directly inserting the text. If you were trying to paste via directly inserting from another rope or rope slice, that would mean looping over their chunks and inserting them one at a time. I was trying to roughly emulate that.

For me the bigger issue, though, is simply that specifically optimizing for copy/paste from within the application seems odd. At least, when I'm using a text editor, a fair bit of the copy/pasting I do actually comes from sources outside the editor. So IMO special-casing your copy/paste code for internal copy/pastes doesn't seem worth it...? But maybe I'm missing something.

truchi commented 3 months ago

I have no plans to add support for anything like that to Ropey, no. To be honest, I kind of regret adding the is_instance() method as well, and I likely won't be reproducing it in Ropey 2 unless someone can make a very compelling argument for it.

😭. I hope someone has a good argument! I don't want to deal with Arc<RefCell<Rope>> (god forbid, Arc<Mutex<Rope>>).

I mean just calling rope.insert() in a loop

Ah ok, I though you were talking about benches you previously did. Thanks, by the way! I'll keep my breakpoint at 6 * MAX_BYTES then.

the copy/pasting I do actually comes from sources outside the editor [...] But maybe I'm missing something.

Wow, I almost never do that. I actually want to optimise space by storing ropes in the undo/redo stack (when "big"). Optimising time is a bonus on top of that.

Thanks

cessen commented 3 months ago

I don't want to deal with Arc<RefCell> (god forbid, Arc<Mutex>).

I doubt you'll need to. I don't mean this with any disrespect (I'm often known to just go for the easy solutions myself at times), but I think most if not all of the use cases people have for is_instance() in Ropey is due to not really thinking out what they really need.

For example, a common use case is to make a clone to do processing in a different thread, but then cheaply check if the original rope changed before utilizing the result of the processing, to make sure it's not stale. But that can be done just as cheaply (and with no need for Arc etc.) with a simple counter that accompanies the rope and increments whenever an edit is made.

Also, keep in mind that I have no plans to stop maintenance of Ropey 1.x even after 2.x is out. It will only be for bug fixes, no new features, but in practice that's been the case for a long time already, because it's effectively feature complete. In other words, if you're happy with 1.x then there's no reason you'll need to move to 2.x. Part of the goal of 2.x is to be a leaner library with a smaller API, so I'm sure some people will choose to stick with 1.x for that reason, and that's totally fine.

I actually want to optimise space by storing ropes in the undo/redo stack (when "big"). Optimising time is a bonus on top of that.

If you're trying to go that far into optimizing in ways that need deeper insight/control over the fundamental text data structure you use, I would actually recommend rolling your own rather than using Ropey or any other third party library. I'm reminded of this talk by John Carmack. At some point, to fully optimize a system, you need close to full control over all the components. Ropey can't reasonably be optimal for all use cases and text editor architectures.

And if you're writing a data structure just for your editor, you can actually get it up and running much faster than writing a library, because you only have to write the parts your editor actually needs. You could also explore something like a piece table, which might be more efficient/natural than a rope for the kinds of optimizations it sounds like you want to do.