cessen / ropey

A utf8 text rope for manipulating and editing large texts.
MIT License
1.04k stars 46 forks source link

correcly fixup tree when a removal occurs at the chunk boundery #67

Closed pascalkuthe closed 1 year ago

pascalkuthe commented 2 years ago

While fuzzing #66 I found a case where the tree ends up in an invalid state after a call to remove (see test case with this PR).

fix_tree_seam incorrectly handled the case of a removal happens exactly at the chunk boundary. While the implementation did in fact consider that edge case, the detection was incorrect: It checked whether the current character was equal to the end of the child. However, in that case search_char_idx would return the next child (the end is exclusive). To fix the problem I changed fix_tree_seam to check for start_info.chart == char_idx instead (and to check the previous child instead of the next one).

cessen commented 2 years ago

Thanks for catching this!

Unless I'm misunderstanding something, it looks like there's also a refactor of the code here, in addition to the relevant change. Can you keep the change to just the minimal change to fix the issue? That would also make this easier to review.

pascalkuthe commented 2 years ago

Thanks for catching this!

Unless I'm misunderstanding something, it looks like there's also a refactor of the code here, in addition to the relevant change. Can you keep the change to just the minimal change to fix the issue? That would also make this easier to review.

I didn't really refactor anything (intentionally at least). All changes I made were strictly intended to fix the bug. It mostly looks that way because some checks now take place at different places. I can try to rework the PR to make it look more similar to the original code but I am not sure if that would work well (but its also possible I am just currently not seeing how to easily change the original code to match the behaviour in this PR). I think the diff makes the changes look larger then they actually are. If you look at the two version side by side it should be a small change really.

cessen commented 1 year ago

I didn't really refactor anything (intentionally at least). All changes I made were strictly intended to fix the bug.

Ah, got it. Your description of the bug made it sound trivial (like changing a < to a <= in a conditional check, or something like that). But I guess it's more involved.

pascalkuthe commented 1 year ago

I didn't really refactor anything (intentionally at least). All changes I made were strictly intended to fix the bug.

Ah, got it. Your description of the bug made it sound trivial (like changing a < to a <= in a conditional check, or something like that). But I guess it's more involved.

No its a bit more involved then that. I will try again at distilling the change to a minimum explanation:

The edge-case this PR covers is if a change occurs at a chunk boundary, in that case we need to call merge_distribute with the neighboring chunk. The previous implementation already covered this edge-case but the check was incorrect. The check occurs twice in this function so I just choose one example here. The old implementation performed the following check:

let (child_i, start_info) = children.search_char_idx(char_idx);
let end_info = start_info + children.info()[child_i];
if end_info.chars as usize == char_idx && (child_i + 1) < children.len() { /* handle special case */ }

This check triggers if following two conditions are true:

This means that this condition can actually never be triggered (verified by fuzzing this function with an unreachable inside the special case).

Instead I implemented the following check:

let (child_i, start_info) = children.search_char_idx(char_idx);
if start_info.chars as usize == char_idx && child_i != 0 {

Now we check whether we are exactly at the start of the child because search_char_idx already returns the next child in the case we wanted to check above and in that case we merge with the previous child.

At the end of the function this chang really is that simple. However if you look at the start of the function there was perviously three special cases:

However with the fix applied we want to check the following conditions:

So the code needed to change a bit more here. Hope that clear that up :)

Btw if I squint at this a ton it might be possible to rearrange the start of the function to look more like the original code but that makes my head spin a little (and would definitely involve quite a bit of duplicate code) so maybe I am just thinking about it weirdly. So it might have been a small accidental refactor at the start of the function but it was definitely not intentional

cessen commented 1 year ago

Thanks for the explanation! Yeah, looking at the old code again, it's kind of a mess. Your new code is both easier to follow and (of course) fixes the bug.

Thanks so much!