Race condition - Githubissues

G-M-twostay commented 1 year ago

Hi, I saw that the sorted linked list is based on Harris's linked list, but according to my understanding, it's not correctly written. Harris's linked list is based on 2 assumptions: 1. Atomic operation on node.next can see changes to node.delete. 2. node.next on a deleted node is still valid and can be used to track to the original next. In this implementation, I see that: 1. node.delete and node.next can't be dealt with in a single atomic operation. This is problematic, consider: node.delete can change immediately before(after the if checks) or during a CAS operation on node.next, and during this process, a successful physical deletion can happen before the CAS operation completes/starts, therefore, the new node is linked onto a deleted node. This is my understanding, correct me if I'm wrong.

I encountered the above problem in my initial attempts to implement such a hashmap using Harris's linked list.

With this in mind, I designed a few cases that can reflect the above problem. However, I'm not sure whether the failures in the below cases are solely caused by the above reason or is/are caused by other problems. It appears to me that at least on my end Case1 has some other problem because a given key is guaranteed to fail. Anyway, let's see these cases. Case 1:

func BenchmarkHaxMap_Case1(b *testing.B) {
    b.StopTimer()
    wg := sync.WaitGroup{}
    for i := 0; i < b.N; i++ {
        M := haxmap.New[int, int]()
        b.StartTimer()
        for k := 0; k < iter0; k++ {
            wg.Add(1)
            go func(l, h int) {
                for j := l; j < h; j++ {
                    M.Set(j, j)
                }
                for j := l; j < h; j++ {
                    _, a := M.Get(j)
                    if !a {
                        b.Error("key doesn't exist", j)
                    }
                }
                for j := l; j < h; j++ {
                    x, _ := M.Get(j)
                    if x != j {
                        b.Error("incorrect value", j, x)
                    }
                }
                wg.Done()
            }(k*elementNum0, (k+1)*elementNum0)
        }
        wg.Wait()
        b.StopTimer()
    }
}

Case 2:

func BenchmarkHaxMap_Case3(b *testing.B) {
    b.StopTimer()
    wg := &sync.WaitGroup{}
    for a := 0; a < b.N; a++ {
        M := haxmap.New[int, int]()
        b.StartTimer()
        for j := 0; j < iter0; j++ {
            wg.Add(1)
            go func(l, h int) {
                defer wg.Done()
                for i := l; i < h; i++ {
                    M.Set(i, i)
                }

                for i := l; i < h; i++ {
                    _, x := M.Get(i)
                    if !x {
                        b.Errorf("not put: %v\n", i)
                    }
                }
                for i := l; i < h; i++ {
                    M.Del(i)

                }
                for i := l; i < h; i++ {
                    _, x := M.Get(i)
                    if x {
                        b.Errorf("not removed: %v\n", i)
                    }
                }

            }(j*elementNum0, (j+1)*elementNum0)
        }
        wg.Wait()
        b.StopTimer()
    }

}

const (
    iter0       = 1 << 3
    elementNum0 = 1 << 10
)

If you increase the data size, this problem becomes more severe. You can delete all the benchmark and timing things.

Modifying these tests to sync.Map or an ordinary map with Mutex will show that no failures happen. In addition, cornelk's hashmap also fails at these tests.

alphadose commented 1 year ago

@G-M-twostay understood, I will take a look at this. Might be related to https://github.com/alphadose/haxmap/issues/23

G-M-twostay commented 1 year ago

If I were correct about the race condition, then to solve it, simply enforce consistency between node.next and node.delete. One such way is to make use an additional struct to combine these 2 fields and directly operate on the immutable struct.

alphadose commented 1 year ago

@G-M-twostay I couldn't understand your solution completely. Can you give me an example code snippet ?

G-M-twostay commented 1 year ago

@alphadose https://secondboyet.com/Articles/LockFreeLinkedList3.html is a better explanation. https://github.com/G-M-twostay/Go-Utils/blob/master/Maps/ChainMap/Node.go. Harris's linked list originally used pointer tagging in C++, but unfortunately that's impossible in Go in my knowledge due to garbage collection.

ItalyPaleAle commented 9 months ago

I am looking at haxmap but am a bit concerned by this issue. After playing with it a bit, it appears it may be related to a race condition while resizing.

I can repro the error easily with @G-M-twostay's code, even as a Test and not a Benchmark. But, if I allocate the haxmap with a larger initial capacity, the error does not appear.

// Old
M := haxmap.New[int, int]()
// New
M := haxmap.New[int, int](iter0 * elementNum0)

-- EDIT

It looks like this is a known "issue". I just saw this in the docs:

If a resizing operation is happening concurrently while calling Set() then the item might show up in the map only after the resize operation is finished

G-M-twostay commented 9 months ago

I am looking at haxmap but am a bit concerned by this issue. After playing with it a bit, it appears it may be related to a race condition while resizing.

I can repro the error easily with @G-M-twostay's code, even as a Test and not a Benchmark. But, if I allocate the haxmap with a larger initial capacity, the error does not appear.
// Old
M := haxmap.New[int, int]()
// New
M := haxmap.New[int, int](iter0 * elementNum0)
-- EDIT

It looks like this is a known "issue". I just saw this in the docs:

If a resizing operation is happening concurrently while calling Set() then the item might show up in the map only after the resize operation is finished

I believe it's related to deletion.

alphadose / haxmap

Race condition #32