Add a `clear_with` method to each data structure.

novacrazy commented 5 days ago

I have a library that uses both the intrusive RBTree and the intrusive linked list, using UnsafeRef to share the node between them. However, because UnsafeRef doesn't implement dropping the pointer on its own, I currently use fast_clear on the tree and then remove the front item of the list in a loop, using UnsafeRef::into_box to drop it, then continue until no elements remain.

This loop of calling remove is sub-optimal because it hooks up the list nodes again despite only iterating forward to drop every node.

It's also infeasible/impossible to write my own UnsafeRef due to the heavy reliance on DefaultPointerOps.

Instead, I propose a clear_with method to provide a function to drop the pointers with custom behavior, with the regular clear being implemented as simply clear_with(drop). Codegen is identical in a quick test. This provides the flexibility I need while remaining fast and safe.

So now my own clear method looks like this:

pub fn clear(&mut self) {
    self.tree.fast_clear();

    self.list.clear_with(|node| {
        // SAFETY: Node is removed from both the tree and list
        let _ = unsafe { UnsafeRef::into_box(node) };
    });
}

I also fixed a handful of typos my IDE noticed.

novacrazy commented 5 days ago

After vs Before this PR assembly for clearing my structure. Pretty handy.

dfgdfhdsfhdfh

Amanieu commented 4 days ago

Can you achieve the same thing with .take().into_iter() and then manually freeing each pointer using the iterator?

novacrazy commented 4 days ago

That's a little bit better but the IntoIter still calls pop_front which calls remove, which results in the following assembly:

```asm test_clear_into_iter: push rsi sub rsp, 32 mov qword ptr [rcx + 16], 0 mov rax, rcx vxorps xmm0, xmm0, xmm0 mov rcx, qword ptr [rcx] vmovups xmmword ptr [rax], xmm0 test rcx, rcx jne .LBB4_1 .LBB4_6: add rsp, 32 pop rsi ret .LBB4_5: mov qword ptr [rcx], 1 add rcx, -16 mov edx, 64 mov r8d, 8 call __rust_dealloc mov rcx, rsi test rsi, rsi je .LBB4_6 .LBB4_1: mov rsi, qword ptr [rcx] mov rax, qword ptr [rcx + 8] test rsi, rsi je .LBB4_3 mov qword ptr [rsi + 8], rax .LBB4_3: test rax, rax je .LBB4_5 mov qword ptr [rax], rsi jmp .LBB4_5 ```

versus the single loop produced by clear_with:

```asm test_clear_with: push rsi push rdi sub rsp, 40 mov qword ptr [rcx + 16], 0 vxorps xmm0, xmm0, xmm0 mov r8, qword ptr [rcx] vmovups xmmword ptr [rcx], xmm0 test r8, r8 je .LBB2_3 mov rsi, qword ptr [rip + __imp_HeapFree] .LBB2_2: mov rdi, qword ptr [r8] mov qword ptr [r8], 1 add r8, -16 xor edx, edx mov rcx, qword ptr [rip + std::sys::alloc::windows::HEAP.0] call rsi mov r8, rdi test rdi, rdi jne .LBB2_2 .LBB2_3: add rsp, 40 pop rdi pop rsi ret ``` also it's curious why `clear_with` chooses to call `HeapFree` while the iter version calls `__rust_dealloc`

Amanieu commented 4 days ago

I think it would be better to optimize into_iter to not update links when iterating.

novacrazy commented 4 days ago

I just tried to replace pop_front in the iterator with code similar to that in clear, but it still seems to add some extra branches as part of the iterator usage:

pub unsafe fn pop_front_fast(&mut self) -> Option<<A::PointerOps as PointerOps>::Pointer> {
    use link_ops::LinkOps;

    if let Some(current) = self.head {
        let next = self.adapter.link_ops().next(current);

        self.adapter.link_ops_mut().release_link(current);

        let res = Some(
            self.adapter
                .pointer_ops()
                .from_raw(self.adapter.get_value(current)),
        );

        self.head = next;

        res
    } else {
        None
    }
}

with that used in IntoIter::next instead of pop_front

New assembly

```asm test_clear: push rsi push rdi push rbx sub rsp, 32 mov qword ptr [rcx + 16], 0 vxorps xmm0, xmm0, xmm0 mov rsi, qword ptr [rcx] mov r8, qword ptr [rcx + 8] vmovups xmmword ptr [rcx], xmm0 test r8, r8 je .LBB2_3 mov rdi, qword ptr [rip + __imp_HeapFree] .LBB2_2: mov rbx, qword ptr [r8 + 8] mov qword ptr [r8], 1 add r8, -16 xor edx, edx mov rcx, qword ptr [rip + std::sys::alloc::windows::HEAP.0] call rdi mov r8, rbx test rbx, rbx jne .LBB2_2 .LBB2_3: test rsi, rsi je .LBB2_5 .LBB2_4: mov rax, qword ptr [rsi] mov qword ptr [rsi], 1 mov rsi, rax test rax, rax jne .LBB2_4 .LBB2_5: add rsp, 32 pop rbx pop rdi pop rsi ret ```

novacrazy commented 4 days ago

I could probably implement fold on IntoIter Iterator with the same code as clear_with, so at least for_each would work better.

novacrazy commented 4 days ago

Oh, I made a mistake and used a pop_back_fast version in next() of the iterator instead. using pop_front_fast actually results in this asm:

test_clear:
    push rsi
    push rdi
    sub rsp, 40
    mov qword ptr [rcx + 16], 0
    vxorps xmm0, xmm0, xmm0
    mov r8, qword ptr [rcx]
    vmovups xmmword ptr [rcx], xmm0
    test r8, r8
    je .LBB2_3
    mov rsi, qword ptr [rip + __imp_HeapFree]
.LBB2_2:
    mov rdi, qword ptr [r8]
    mov qword ptr [r8], 1
    add r8, -16
    xor edx, edx
    mov rcx, qword ptr [rip + std::sys::alloc::windows::HEAP.0]
    call rsi
    mov r8, rdi
    test rdi, rdi
    jne .LBB2_2
.LBB2_3:
    add rsp, 40
    pop rdi
    pop rsi
    ret

which is ideal.

My mistake.

I'll clean this up and push changes instead of using clear_with

novacrazy commented 4 days ago

I have no idea how to implement fold on the RBTree IntoIter, so the specialization of fold may only be for the lists.

novacrazy commented 3 days ago

Honestly after experimenting with the iterator optimization today, I still think a clear_with is easier and cleaner. We can more easily document how panicking can lead to memory leaks, for example.

I can try again eventually, but I may not have time to explore alternatives myself more.

Amanieu commented 3 days ago

I have no idea how to implement fold on the RBTree IntoIter, so the specialization of fold may only be for the lists.

This could be implemented using recursion (visit left subtree, then self, then right subtree), but it's not obvious that this would be faster in practice.

Honestly after experimenting with the iterator optimization today, I still think a clear_with is easier and cleaner. We can more easily document how panicking can lead to memory leaks, for example.

I don't think this justifies adding a whole new API. I would much prefer just making iterators better.

novacrazy commented 2 days ago

Then for now I'm just going to get away with using:

fn clear_list<K, V>(list: &mut LinkedList<NodeListAdapter<K, V>>) {
    let mut front = list.front();

    while let Some(ptr) = front.clone_pointer() {
        front.move_next();

        let _ = unsafe { UnsafeRef::into_box(ptr) };
    }

    // ensure LinkedList::drop does nothing
    list.fast_clear();
}

for my personal use.

Amanieu / intrusive-rs

Add a `clear_with` method to each data structure. #95