darkautism / lfqueue

Minimize lock-free queue ever!
Do What The F*ck You Want To Public License
132 stars 27 forks source link

New branch: low-contention #7

Closed pcordes closed 6 years ago

pcordes commented 6 years ago

This builds on my previous pull-request.

Removing unneeded barriers and optimizing with XCHG instead of CAS doesn't make things worse, as far as I can tell.

This is just really really hard to get right, and your current HP[tid] idea might have fundamental design flaws.

With garabage collection to solve the reclamation problem, a linked list works pretty well. But without garbage collection, it's a huge problem. (See http://www.yebangyu.org/LockFreeProgrammingPractice.pdf for a collection of some papers, including one showing just how complicated a lock-free linked list is, if you really want to free nodes instead of just using a free list. ABA problems make it extra hard.)

There's a reason everyone uses circular buffers like the MPMC queue analyzed in https://stackoverflow.com/questions/45907210/lock-free-progress-guarantees. (From https://liblfds.org/). It should be much more efficient than this linked-list thing, because it doesn't have to scan an array in the consumer.

In your version, readers contend heavily for access to that array as well as other shared data. malloc/free probably contend across threads, too, so that's another big reason for a fixed-size buffer.

darkautism commented 6 years ago

I hit assert in this branch too. CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz Program: bin/test_p100c10

Old version (branch FSHW) do not use any memory-barrier, just use CAS operator.

In new branch (HP), i'm trying to implement Hazard Pointers algorithm. And i found double free bug. I use these memory-barrier because solve double free bug. And I do many research to find out why it's happened. But manual always tell me malloc/free is thread-safe.

This is why the lastest version fill with mb to prevent this bug. I know memory-barrier spend more CPU, and make program more slower. But crash is more unbearable than slower performance.

If you know how to resolve the memory-barrier with crash, please let me know. Thanks.

darkautism commented 6 years ago

HP[tid] is not my idea. Here is the paper I don't have any credit for this algorithm.

pcordes commented 6 years ago

I wasn't seeing any double-free bugs, just use-after-free. That's a separate thing.

This is why the lastest version fill with mb to prevent this bug.

But they don't prevent the bug. My tests branch shows that just adding more checks to your code without removing any memory barriers does detect problems.

I see the assert trigger in that branch on my i7-6700k (gcc7.3 -O3). I made sure to keep all your memory barriers and stuff you were doing in that version, and just added the poison + assert checking.

Therefore you have a bug somewhere. I don't know where; I'm not that interested in finding it because this whole thing looks very inefficient. It's also not in a usable state for other reasons: the free list can grow very large, like 2.5GiB memory usage for the p100c10 test.

Most of the changes in this branch are functionally equivalent (I think all of them are unless I made mistakes). If you remove the assert checking in this version (compile with make CPPFLAGS=-DNDEBUG), the tests pass consistently, just like they did before I added the assert to your original.

Definitely replacing CAS loops with XCHG is a win, in the 2 cases I did that, and I'm sure that's safe.

This would be easier to read using C11 stdatomic; then you could see seq-cst stores vs. release stores clearly. IDK if you want to rewrite it that way.

pcordes commented 6 years ago

HP[tid] is not my idea. Here is the paper

It would have been helpful to include a comment in the code, then, describing how it's supposed to work.

darkautism commented 6 years ago

I remove assert and hit this crash.

[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `./bin/test_p100c10'. Program terminated with signal 11, Segmentation fault.

0 free_pool (freeall=false, ctx=0x7ffe513df600) at lfq.c:45

45 if ( (!p->can_free) || (!p->free_next) || inHP(ctx, (struct lfq_node )p) ) Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 (gdb) l 40 return; // this pool free is not support multithreading. 41 volatile struct lfq_node p; 42
43 for ( int i = 0 ; i < MAXFREE || freeall ; i++ ) { 44 p = ctx->fph; 45 if ( (!p->can_free) || (!p->free_next) || inHP(ctx, (struct lfq_node )p) ) 46 goto exit; 47 ctx->fph = p->free_next; 48 free((void )p); 49 }

Test: while ./bin/test_p100c10; do echo success ; done

It happened while i running 3 hour. Your code has better performance than me, but hit another bug. coredump

pcordes commented 6 years ago

Interesting. I wonder if the node got freed after being added to the free list.

We know the code has use-after-free bugs, and segfaults are an obvious consequence if the whole page gets unmapped. But this is in a different part of the code from the one the assert is checking. Would be interesting to add an assert there, too, to look for use-after-free in that case without the whole page having to be unmapped for you to detect it.


Or possibly my idea of using a union {next, free_next} was a problem.

Hmm, maybe another thread can be looking at (or preparing to write) the next pointer when we add it to the free-list. I don't think the producer could do that, because new nodes get added with next=NULL, and the consumer will never free the last node.