Closed pascalkuthe closed 1 year ago
Thanks for catching this!
Unless I'm misunderstanding something, it looks like there's also a refactor of the code here, in addition to the relevant change. Can you keep the change to just the minimal change to fix the issue? That would also make this easier to review.
Thanks for catching this!
Unless I'm misunderstanding something, it looks like there's also a refactor of the code here, in addition to the relevant change. Can you keep the change to just the minimal change to fix the issue? That would also make this easier to review.
I didn't really refactor anything (intentionally at least). All changes I made were strictly intended to fix the bug. It mostly looks that way because some checks now take place at different places. I can try to rework the PR to make it look more similar to the original code but I am not sure if that would work well (but its also possible I am just currently not seeing how to easily change the original code to match the behaviour in this PR). I think the diff makes the changes look larger then they actually are. If you look at the two version side by side it should be a small change really.
I didn't really refactor anything (intentionally at least). All changes I made were strictly intended to fix the bug.
Ah, got it. Your description of the bug made it sound trivial (like changing a <
to a <=
in a conditional check, or something like that). But I guess it's more involved.
I didn't really refactor anything (intentionally at least). All changes I made were strictly intended to fix the bug.
Ah, got it. Your description of the bug made it sound trivial (like changing a
<
to a<=
in a conditional check, or something like that). But I guess it's more involved.
No its a bit more involved then that. I will try again at distilling the change to a minimum explanation:
The edge-case this PR covers is if a change occurs at a chunk boundary, in that case we need to call merge_distribute
with the neighboring chunk.
The previous implementation already covered this edge-case but the check was incorrect.
The check occurs twice in this function so I just choose one example here.
The old implementation performed the following check:
let (child_i, start_info) = children.search_char_idx(char_idx);
let end_info = start_info + children.info()[child_i];
if end_info.chars as usize == char_idx && (child_i + 1) < children.len() { /* handle special case */ }
This check triggers if following two conditions are true:
end_info.chars == char_idx
: this means char_idx
is one past the end of child_i
, however if that is the case search_char_idx
would already return the next child unless you are looking a the last childThis means that this condition can actually never be triggered (verified by fuzzing this function with an unreachable
inside the special case).
Instead I implemented the following check:
let (child_i, start_info) = children.search_char_idx(char_idx);
if start_info.chars as usize == char_idx && child_i != 0 {
Now we check whether we are exactly at the start of the child because search_char_idx
already returns the next child in the case we wanted to check above and in that case we merge with the previous child.
At the end of the function this chang really is that simple. However if you look at the start of the function there was perviously three special cases:
merge_distribute
with the next childmerge_distribute
with the previous child if the current child or the previous child is invalidchild_i == 0
) in that case distribute with the next childHowever with the fix applied we want to check the following conditions:
child_i == 0
(we can never distribute with the previous child) => distribute with the next child if the current child is invalidSo the code needed to change a bit more here. Hope that clear that up :)
Btw if I squint at this a ton it might be possible to rearrange the start of the function to look more like the original code but that makes my head spin a little (and would definitely involve quite a bit of duplicate code) so maybe I am just thinking about it weirdly. So it might have been a small accidental refactor at the start of the function but it was definitely not intentional
Thanks for the explanation! Yeah, looking at the old code again, it's kind of a mess. Your new code is both easier to follow and (of course) fixes the bug.
Thanks so much!
While fuzzing #66 I found a case where the tree ends up in an invalid state after a call to
remove
(see test case with this PR).fix_tree_seam
incorrectly handled the case of a removal happens exactly at the chunk boundary. While the implementation did in fact consider that edge case, the detection was incorrect: It checked whether the current character was equal to the end of the child. However, in that casesearch_char_idx
would return the next child (the end is exclusive). To fix the problem I changedfix_tree_seam
to check forstart_info.chart == char_idx
instead (and to check the previous child instead of the next one).