Amanieu / intrusive-rs

Intrusive collections for Rust
Apache License 2.0
423 stars 50 forks source link

Great crate, major hole in it ;-) #18

Open przygienda opened 6 years ago

przygienda commented 6 years ago

Crate is great. Very common problem though in this kind of stuff. I insert into rbtree, then use the link list to change keys, tree ends up corrupted since it seems to pull local copies of keys (basically invariants change). Adapters would need to somehow let each other know when the key is being modified or some other kind of API fix to prevent this.

// get bunch elements into tree/lists & then flip keys & see whether sorting on the
// tree holds

    let mut a = LinkedList::new(MyAdapter::new());
    let mut b = SinglyLinkedList::new(MyAdapter2::new());
    let mut c = RBTree::new(MyAdapter3::new());

    for v in &[30, 40, 50, 60, 70, 80, 90, 100] {
        let mut test = Rc::new(Test {
            value: Cell::new(*v),
            ..Test::default()
        });
        a.push_front(test.clone());
        b.push_front(test.clone());
        c.insert(test);
    }

    {
// Find the first element which is greater than or equal to min
        let mut cursor = c.lower_bound_mut(Bound::Included(&40));

        let mut cont = true;
        // Iterate over all elements in the range [min, max]
        while !cursor.is_null() {
            let v = cursor.get();
            println!("{:?}", v);

            cursor.move_next();
        }
    }

    let mut v = 0xff; // pseudo rand

    let deft = Test {
        value: Cell::new(99),
        ..Test::default()
    };

    // divide by ten
    let mut mc = a.cursor_mut();

    mc.move_next();

    while ! mc.is_null() {
        let iv = &mc.get().unwrap_or(&deft).value;

        if let Some(imc) = mc.get() {
            println!("{} -> {}", iv.get(), iv.get() + v);
            imc.value.set(iv.get() + v);
        }

        v ^= iv.get();
        mc.move_next();
    }

    {
        let mut cursor = c.lower_bound_mut(Bound::Included(&40));

        let mut cont = true;
        // Iterate over all elements in the range [min, max]
        while !cursor.is_null() {
            let v = cursor.get();
            println!("{:?}", v);

            cursor.move_next();
        }
    }
Amanieu commented 6 years ago

Unfortunately I don't think it is possible to fix this hole reliably. Note that this isn't unique to this crate. If you look at the documentation of BTreeMap in the standard library, the last paragraph of the overview says this:

It is a logic error for a key to be modified in such a way that the key's ordering relative to any other key, as determined by the Ord trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code.

Our documentation says something similar:

Note that you are responsible for ensuring that the elements in a RBTree remain in ascending key order. This property can be violated, either because the key of an element was modified, or because the insert_before/insert_after methods of CursorMut were incorrectly used. If this situation occurs, memory safety will not be violated but the find, upper_bound, lower_bound and range may return incorrect results.

The good news is that, like BTreeMap, RBTree does not rely on Ord to maintain memory safety. The only negative impact is that search may return the wrong element and the elements won't be sorted in increasing key order.

przygienda commented 6 years ago

ok, understood. It's a very major pain in all languages/designs pursuiting this "intrinsic collection pointer" stuff (albeit it's often used due to memalloc efficiency). So that's why I'm stretching ;-)

Freewheeling thinking here: Since you have such clean adapters on top, couldn't we force the key to be something like recursive RWLock and the tree when element is inserted RLocks the key? When removed from tree it unlocks the rlock. only way to mod the key would be to remove the element from all trees & then WLock the key ... Lock could be hidden as abstract interface that allows for rlock() & wlock_scope(). trying to insert holding wlock into a collection would fail on "cannot read lock key" ...

Amanieu commented 6 years ago

That wouldn't actually help since you could use a custom type as a key which returned a random result in its Ord implementation. In the end, I think it's just too difficult or too expensive to enforce that keys are not changed, so I prefer simply leaving the responsibility to the user.

przygienda commented 6 years ago

hmm, ok but I don't follow the argument fully. if someone misimplements the Ord then nothing can be done & even normal tree won't work. Same as Hash changing and normal collections.

So, no interest in MisorderFreeRBTree ;-) ?

From long experience on large codebases the changing key problem is one of the hardest bugs to track, resolve and can be trivially introduced by someone fresh working on the code & not understanding all the collections and their invariants. Trees are not that bad but it gets even harder once you have e.g. ordered queues and someone changes keys. Having somehow the concept of a "mod-safe-key" that can only be mucked around once the struct is off all collections would go a long way to sell thjis package iME ;-)

Amanieu commented 6 years ago

Hmm, actually after thinking about this a bit, it might be possible to add this as an optional feature. We could add the following methods to KeyAdapter:

    fn lock_key(&self, value: &'a Self::Value) {}
    fn unlock_key(&self, value: &'a Self::Value) {}

They will be called before an object is inserted into the tree and after it is removed. By default they do nothing, but you can use with a custom key wrapper type to prevent it from being modified.

Thoughts?

przygienda commented 6 years ago

aha, yepp, @ first glance a good idea. Is the assumption that the intrinsic collection would lock on insertion/removal? That could work. I was thinking more around something like a ModSafeKey since here, if e.g. someone implements naively lock_key as exclusive read the whole thing would deadlock when 2 collections are inserted ...

Maybe having a RBTree and ModSafeRBTree with diffrent interface? Of course if someone mixes keymod safe and unsafe collections in same struct, they're on their own ... However, I'd expect that you fo9rced to do

struct X { key: ModSafeKey,

}

and if you use SafeRBTree and SafeOrderedQueue then you have to pass ModSafeKey as key anyway for it so yuo couldn't just do

struct X { key: ModSafeKey, safecoll1: ...::new(key), unsafecoll2: ::new(key), }

since unsafe won't take ModSafeKey and you have no chance to get to i32 within the key without going through collection ?

Just ruminating ... Never found a solution for that in C or even C++ (C++ more due to complexity, you know the hillarious pain modifying on a iterator in coollection in a loop can cause in STL ;-)

--- tony

On Thu, Apr 19, 2018 at 9:29 AM, Amanieu notifications@github.com wrote:

Hmm, actually after thinking about this a bit, it might be possible to add this as an optional feature. We could add the following methods to KeyAdapter:

fn lock_key(&self, value: &'a Self::Value) {}
fn unlock_key(&self, value: &'a Self::Value) {}

They will be called before an object is inserted into the tree and after it is removed. By default they do nothing, but you can use with a custom key wrapper type to prevent it from being modified.

Thoughts?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Amanieu/intrusive-rs/issues/18#issuecomment-382798811, or mute the thread https://github.com/notifications/unsubscribe-auth/ABo0Cx4yiDTqd4Ah7tkKltxFCRQqjvwoks5tqLtNgaJpZM4TbS2t .

Amanieu commented 6 years ago

Ah no, it's very simple actually: we can just make ModSafeKey<T> deref to T. Keep in mind that the locking (effectively just a reference count on the number of trees that the key in) only serves to prevent writing. You can always read the key.

Regarding a whole separate ModSafeRBTree, I really don't want to do that because it's a lot of code duplication for a very small change. Just allowing custom hooks in KeyAdapter should be enough.

przygienda commented 6 years ago

ok, yes, the key should be readable, for sure, we just want to prevent mod as long on ordered (or other invariant assuming) collection ...

So what if we force wrapping the key into something implementing a trait. The implementation is either ModUnsafe<> or ModSafe<>, both give you the locking methods implemented, one empty, the other making sure you can't mod if on collection ...

Generally, serving stuff over traits rather than adapters is more Rust like albeit I like adapters & protocols architectures and used them heavily, espeically in Python PEAK ...

On Thu, Apr 19, 2018 at 9:46 AM, Amanieu notifications@github.com wrote:

Ah no, it's very simple actually: we can just make ModSafeKey deref to T. Keep in mind that the locking (effectively just a reference count on the number of trees that the key in) only serves to prevent writing. You can always read the key.

Regarding a whole separate ModSafeRBTree, I really don't want to do that because it's a lot of code duplication for a very small change. Just allowing custom hooks in KeyAdapter should be enough.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Amanieu/intrusive-rs/issues/18#issuecomment-382804233, or mute the thread https://github.com/notifications/unsubscribe-auth/ABo0C5lQwKFvjY9H9cNpKzX3O4QTJ3N_ks5tqL9xgaJpZM4TbS2t .

Amanieu commented 6 years ago

Note that the key can only be modified if it is inside a Cell or a RefCell. An easy way to ensure the key can't be modified while it is used in a tree is to simply not put it in a cell type. Rust's lifetime system will then guarantee that the value is never changed.

przygienda commented 6 years ago

yupp, valid observation but it would be better the collection prevents it even when it's Cell<>'ed. I think we walked the space. Let's see whether you give it some cycles. I'm not using the crate right now but if I get to optimize something big here & start to use it, I may fork and work on that ... thanks. tony

On Thu, Apr 19, 2018 at 10:13 AM, Amanieu notifications@github.com wrote:

Note that the key can only be modified if it is inside a Cell or a RefCell. An easy way to ensure the key can't be modified while it is used in a tree is to simply not put it in a cell type. Rust's lifetime system will then guarantee that the value is never changed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Amanieu/intrusive-rs/issues/18#issuecomment-382812659, or mute the thread https://github.com/notifications/unsubscribe-auth/ABo0C1KXmjEWV5vwT0vfRrcDOU18dJntks5tqMXEgaJpZM4TbS2t .