Fantom-foundation / go-lachesis

aBFT consensus for permission-less networks
https://fantom.foundation
MIT License
223 stars 78 forks source link

Proposed new architecture #166

Closed SamuelMarks closed 4 years ago

SamuelMarks commented 5 years ago

It might be about time to split this repository in 12. One for each consensus, CLI client, common utilities, interfaces and protobuf files (including a common error proto).

Here is what I'm envisioning:

Proposed architecture: arch

Already done major work in this direction with many crates inside a single workspace in the Rust code base.

Advantages:

Disadvantages:


Would welcome your thoughts.

dev10 commented 5 years ago

I agree with all the points above. As the project becomes more complex with overlapping parts/ideas there is definitely a need for separation of concerns.

This will allow for:

Great idea.

Potential problems are:

devintegral commented 5 years ago

I think it is not time to split go-lachesis repository in 12 yet. It might be sense if we have well working parts (consensus, network, etc) with no volatile interfaces. At the moment we don't have any of this: poset and posposet are in development stage now, poset and node is not splitted well, and so on. Partitioning by go-packages is enough for independently working at this stage. Single github repository is necessary for fast change review and discussion.

All we need is a review before merging (@SamuelMarks ;)) and cover note for changes (@dev10 ;)). And some knowledge about other teams tasks through coordinator. Separated repositories don't make it easier.

Maxime2 commented 5 years ago

The first question is what this repository is for?

If it is for lachesis consensus algorithm as stated in the name, then swirlds, paxos, raft consensus implementations in the scheme above should better be in separate repositories.

If it is for, let call it 'Fantom consensus algorithm', which implements all of mentioned above consensus algorithms it is better to rename the repo and reorganise internal structure of it reserving place for all these algorithms.

But the root question is that we are in fact having two implementations of lachesis algorithm which are conflicting for going into master of this repo. These implementations would share some basic functionality (e.g. transport protocol, serialisation), though would need some specific functionality (e.g. storage engines) for better performance.

I think it would be better to retain this repo for lachesis consensus algorithm only, so we place other consensus algorithms in separate repos, but the implementations of these algos should use basic functionalities of this repo as much as possible. Then we need to try to harbour two lachesis implementations in this repo to make them interchangeable until we see which is better.

devintegral commented 5 years ago

@Maxime2, great! I am about the same. We have a few ways to implement Lachesis consensus:

And it is nice for project, we will see which implementation is better later.

SamuelMarks commented 5 years ago

Thanks for your input. Will keep this proposal open for a little while, to give time for the team to give their viewpoints.

BTW: Here is what I was thinking in terms of repository dependency hierarchy:

repo-hierarchy

For the -lib repositories, it may be useful to do version negotiation at the config level here, so we can have, e.g.: two versions of lachesis running side by side so we can:

  1. experiment with upgrade scenarios;
  2. benchmark changes
  3. find bugs present in one version but absent in others (consensus-lib would implement generic consensus tests)

The shared common repository [repositories?] will do regular things like: logging, error handling, configuration handling and related, using proper libraries for this, but setting constants like log format, ensuring consistency across all our code bases.

devintegral commented 5 years ago

As mentioned above, at this stage of development, divided into repositories can be premature and costly. Looks like dividing into go-packages in one repository will be enough for now. But maybe I just haven't information about implementation of the Raft, Paxos and Swirlds, who will work on it? Anyway, for the beginning you need to try identify common interfaces by which will be possible to divide the repository in the future. Also, as I see, there is no block that will start the service from specified in the configuration consensus and transport.

andrecronje commented 5 years ago

@Maxime2 is the following the most recent implementation of Jiho and Sang-min's work; https://github.com/Fantom-foundation/go-lachesis/tree/maxime-21-feb-lachesis-clotho

There are three primary objectives currently which are causing conflicts.

  1. Consensus -> Clotho/Atropos selection, @Maxime2 can you share the implementation?
  2. PoS -> As supplied in this PR
  3. Dynamic Participation -> As proposed by Martin and implemented by @dev10

@devintegral can you do a diff on @Maxime2 consensus implementation to see if it can be merged into this? The PR itself looks good, we just need to merge 1 and 2 and then relook at 3.

@Maxime2 & @devintegral is it possible to merge these two? That would be first prize.

devintegral commented 5 years ago

@andrecronje, it is impossible to merge 1 and 2, because they have different approaches: predetermined vs non predetermined peers list, on-going vs on-demand events processing, etc. We should drop one approach to merge code into single package. I suggest to have two consensus packages: 'src/poset' and 'src/posposet' at master, and choose one later, when we've got full knowlidge about their strengths.

I see it is possible to merge 1 and 3 into 'src/poset', because 3 is based on 1.

Maxime2 commented 5 years ago

@andrecronje Here is how I see the state of things at the moment:

The latest changes as of today are in this branch: https://github.com/Maxime2/lachesis/tree/20-mar-lachesis-clotho-atropos . It is a work in progress.

Having played with it I would say it works much faster than previous poset implementation, is reaching consensus to atropos on 15, 25, 35 and 65 nodes (juts have not checked on more nodes, but think it will work).

However we found several issues with lachesis using this implementation which need to be resolved before we could make final consensus to sort all events uniformly on all nodes:

I perceive any modifications to lachesis either it is dynamic participation either PoS premature at this stage as lachesis itself is not finalised and its architecture may be substantially changed in future development. In addition, I believe, these modification would need further research into correctness of these modification (e.g. these modifications do not break consensus and all nodes have the very same order of events in consensus).

BTW: May I ask for PoS task specification so I have it in mind when developing Jiho and Sang-min's prototype?

devintegral commented 5 years ago

@Maxime2, you have not to share your efforts on Jiho and Sang-min's implementation with posposet implementation. Posposet is pretty independent package, and I'm in charge of its tests. All I want is announce my reseach direction and share results into master branch as soon as possible to prevent a double efforts.

So, what is your objection against both poset and posposet in master at the moment?

quan8 commented 5 years ago

@Maxime2 please review the PR 161 and check in it.

For Sam's suggestion, I think it's not suitable at this stage. This repos is only for lachesis consensus and its variants.

SamuelMarks commented 5 years ago

Thanks for all your input.

@quan8 you say this repo is just for lachesis, but if you look through the codebase and its PRs, there is clearly more going on. Abstracting out the consensus layer will enable the other layers to evolve independently, as long as they maintain the same interface.

Additionally it will allow benchmarking and debugging with different versions, expansion of use-cases—implementing Raft and/or Paxos will enable private, trusted blockchain scenarios—and altogether better modularity.

There will be a major rearchitecture at some point, the only questions on the table are: how & when.


In terms of the mutually exclusive work by @Maxime2 and @devintegral, I would like to you both to suggest ways of extracting out this capability to 1 or 2 other repositories, and hiding it behind a conditional (so that we can switch between these two implementations). In the process, we will naturally need to make many of the dependent components more abstract. So would be good to get your insights, rather than just me going through with the risk of making many components over/under abstract.

Thanks

Maxime2 commented 5 years ago

@devintegral , please announce your research direction, what is PoS task requirements and specification. If posposet is pretty independent package why it replicates all functionality of poset, - this doesn't not look like a good architectural decision if you want to avoid efforts duplication (putting aside it is an improved, in therms of code quality, version of lachesis known to have problem with consensus at scale and some correctness flaws).

So the main objections to posposet: it is premature optimisation and modification to the lachesis algorithm which came at time of other modifications into the same code aiming to solve found problem with lachesis algorithm; it seems was not tested if it can reach consensus on 15+ nodes; it seems slower than new prototype.

@SamuelMarks , before suggesting any changes in attempt to join (or split up) it is better to know what PoS task is about, its specification and requirements, which I am not aware yet. As well the information about other possible uses of lachesis would be appreciated and beneficial for lachesis development.

devintegral commented 5 years ago

@Maxime2, you will get PoS task specification in section 7.1 of https://swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf. My research direction is described at #161 header: "make possible to build decentralized trustless network". To make a new src/posposet package is the only just architectural decision. Modification of src/poset seems not good because:

So, as previously mentioned, src/posposet is not premature optimisation, it is research. Agreed, src/posposet should be tested better. And it will be done. But it does not make sence until your team aproved #161 research direction. Posposet will make opportunity to compare it with your unpublished solution and combine the both into the best result.

Maxime2 commented 5 years ago

@devintegral section 7.1 of https://swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf is a generalisation of Swirlds. May I see your research document reflecting it into Lachesis, in particular, why PoS approach was reflected for voting but not for time calculation; and have you proved you have the same value of MinMax here https://github.com/Fantom-foundation/go-lachesis/blob/7ce8564514e144c4aa6bf3d634e383c96d55e0d6/src/posposet/poset.go#L303 on every node independently on selection of nodes forming the stake ?

You perhaps not get PoS task correctly, because it has nothing against fixed peer list, it assumes that each peer is not equal, and nothing else. So it could be predefined fixed peer list with assigned weight to each peer. Moreover cited paper says the PoS system is the same as original if you set weight 1 to each peer.

It is clear the re-run in case of stake change gives the biggest impact on performance when PoS is implemented. Have you done any research on possibility to avoid it?

Obviously the PoS specification doesn't require poset reengineering. I would prefer to change peers for that task rather than poset.

devintegral commented 5 years ago

@Maxime2, move this question to #161, please. This issue is not about src/posposet, it is about repository organization.

Maxime2 commented 5 years ago

@devintegral moved.

devintegral commented 4 years ago

outdated