Open jeehoonkang opened 6 years ago
Hey, this looks amazing! :) I haven't read the proof in detail yet, but here go a few quick comments/questions:
I'd like to write a paper out of https://github.com/jeehoonkang/crossbeam-rfcs/blob/deque-proof/text/2017-07-23-relaxed-memory.md and this. Some of my colleagues in academia (including my supervisor) are also reading this proof.
I don't have any benchmark numbers yet.. I'll prepare soon before merging https://github.com/crossbeam-rs/crossbeam-deque/pull/2.
I couldn't properly run CDSChecker.. I followed the instruction in https://github.com/computersforpeace/model-checker, but it always say OUT OF BOOTSTRAP MEMORY
when running ./run.sh
. Do you have any idea?
I just fixed a few bugs, but I don't think all the bugs are squashed right now. I'll reread the proof 2-3 more times.
I think the RFC itself is now ready to be merged. I checked the proof several times, and updated it. @stjepang please have a look!
Though the implementation needs some more work:
crossbeam-deque
doesn't have any benchmark. It's better to make one, and see how this RFC improves (or degrades!) the performance.@stjepang Thanks! I revised the document as you mentioned.
I'm confident with this proof. But since it will be deployed in production (Firefox), it's better to peer-review this proof, at least. Before merging it, I'd like to wait for my supervisor @gilhur to finish reading this proof.
This new orderings seem like a pure win over the current implementation, especially so on weakly-ordered architectures.
I thought the same thing, but it turns out that the original and the new version makes almost identical x86 and ARM binary. In particular, I thought that a CAS w/ release/acquire orderings for success/failure cases is more efficient than CAS w/ seqcst/relaxed, but they are actually compiled to the same instruction in x86 and ARMv8. (Also in ARMv7 for GCC, while theoretically the former can be more efficiently compiled).
As you suggested in IRC, I'll benchmark this branch w/ rayon.breadth_first()
. I don't expect a huge win in even weaker architectures, though.
Still I believe it is good to merge this branch, because it'll give me a better chance for paper acceptance it more clearly reveals the synchronizations conducted in this data structure. In particular, in my opinion, it's hard to understand the meaning of SeqCst
load/store/rmw. What do you think?
If you agree, I'd like to delete
is_pinned
and replace it withpin_fence
.
I think we can do it without an RFC, since it doesn't change the existing API. But maybe pin_fence
is not the best name for it. Here I'll trust your judgment :)
@jeehoonkang Any news on this issue? How's the review of this proof going?
Sorry for inactivity these days. My colleage and I are working on a different paper, so revising this proof had been postponed probably until mid July. I guess the merge of this PR is not urgent, right? I'll definitely come back and merge this PR :)
Just checking, take your time. :)
Rendered Implementation
This is a linearization proof of the Chase-Lev work-stealing deque. To the best of my knowledge, it is the first publicly available linearization proof (attempt) of the Chase-Lev deque :)