Fantom-foundation / lachesis-rs

Lachesis BFT consensus for permission-less networks, in Rust
MIT License
34 stars 5 forks source link

Use of locks in Lachesis, as opposed to lock-free DS #57

Open SamuelMarks opened 5 years ago

SamuelMarks commented 5 years ago

Was discussing this line with @Maxime2: https://github.com/Fantom-foundation/lachesis-rs/blob/7788c17/lachesis-rs/src/swirlds.rs#L937

Would welcome a discussion on lock free data-structures, and other alternatives to mutexes. Worried about bottlenecks otherwise.

AgustinCB commented 5 years ago

I think that either the line pointed is wrong or I fail to see how it's related with subject.

Can you expand in the problem you worried about?

As I understand, swirlds right now uses only two mutexes, one for the state of the node and another one for the hashgraph. The one of the state of the node is used widely for a bunch of things. See https://github.com/Fantom-foundation/lachesis-rs/blob/7788c17/lachesis-rs/src/swirlds.rs#L46. That can cause a bottleneck. There are some things there that we can move, for example, the usize in the state could be an AtomeUsize instead. Some of the algorithm could also be refactored to use more immutable datastructures instead of mutable state, which will make the app more memory hungry, but less "lock-y." Other datastructures, such as the network and the rounds list could potentially be replaced with non standard lock free datastructures.

There are some that can't, though. Lock free datastructures can't provide safe iteration nor length (I read somewhere that can be proved, but I didn't see the prove). We could mitigate the problem by researching whether it's possible to develop a custom HashMap (and a HashSet from it) that uses lock free insert, delete, check and get but that uses a lock for iterating and length calculation, however my gut feeling is that it's not trivial.

In lachesis, I'm doing things differently: For starters, I use multiple mutexes instead of only two. This reduces the number of bottlenecks, but also makes it more likely to introduce deadlocks if you don't pay attention (and I didn't, a couple times). The result is slightly more performant, but at the cost of development time (I spent a lot of time thinking about order of locks and checking for deadlocks).

I welcome suggestions, as I struggled to find something better. However, I think we have a couple low hanging fruits that we can grab first (the usage of more Atomic* and replacement for lock free datastructures where no iteration nor length check is needed).

Maxime2 commented 5 years ago

The line is pointed correctly, though no explanation given. At this place we create a new event and the first argument of Event::new() takes payload - here it would be an array of transactions received by a node but not yet put into hashgraph yet. We assume, that there would be multiple clients supplying transactions to a node and these transactions should be collected into a pool until next create_new_head() call, which would empty or reduce that pool. Thus we have a classical multiple suppliers single consumer pattern here, so we either need a lock managing access to the pool or we need lock-free implementation of that pool.

AgustinCB commented 5 years ago

Ahhhhh... That makes sense! Well, at the beginning I thought about using AtomicPtr there, but that may cause the loss of transactions... Will review tomorrow the possibility of using lock free lists, since we don't need single element access and we could get away with a linked list...

SamuelMarks commented 5 years ago

Yeah, maybe even the append only list I referenced in #rust on Slack


Good analyses both of you.

We should be able to solve the length issue with a counter and a CAS. Not sure about the rest. It may also be worthwhile for us to rewrite the consensus algorithm to better fit a lock-free data-structure implementation.

More on that later!

AgustinCB commented 5 years ago

I don't think CAS solves the problem, though. The operation of adding to the free-lock list would be: "first atomic add to list, then atomic increase the counter." If a context switch happens before the second step and after the first one to pass execution to a thread that is checking the length of the list, the value would be incorrect. It's not great.