karalabe commented 7 years ago

Changelog:

Apr 4, 2017:
- Mention the cascading proposal-execution corner case and its avoidance.
Mar 14, 2017:
- Expanded the Clique block authorization section, added a strategy proposal.
- Expanded the Clique signer voting section, added a strategy proposal.
Mar 13, 2017:
- Polished up the constants in the Clique consensus protocol spec.
- Added the two difficulty values and described in-turn/out-of-turn signing.
Mar 11, 2017:
- Added initial technical specs for the Clique PoA consensus protocol.
- Added checkpointing to reset votes and embed the list of signers into epoch headers.
- Reintroduced authorized signer vanity extra-data as a fixed 32 byte allowance.
Mar 6, 2017
- First proposal of the Rinkeby testnet and its PoA implementation ideas.

Clique proof-of-authority consensus protocol

Note, for the background and rationale behind the proposed proof-of-authority consensus protocol, please read the sections after this technical specification. I've placed this on top to have an easy to find reference for implementers without having to dig through the discussions.

We define the following constants:

EPOCH_LENGTH: Number of blocks after which to checkpoint and reset the pending votes.
- Suggested 30000 for the testnet to remain analogous to the mainnet ethash epoch.
BLOCK_PERIOD: Minimum difference between two consecutive block's timestamps.
- Suggested 15s for the testnet to remain analogous to the mainnet ethash target.
EXTRA_VANITY: Fixed number of extra-data prefix bytes reserved for signer vanity.
- Suggested 32 bytes to retain the current extra-data allowance and/or use.
EXTRA_SEAL: Fixed number of extra-data suffix bytes reserved for signer seal.
- 65 bytes fixed as signatures are based on the standard secp256k1 curve.
NONCE_AUTH: Magic nonce number 0xffffffffffffffff to vote on adding a new signer.
NONCE_DROP: Magic nonce number 0x0000000000000000 to vote on removing a signer.
UNCLE_HASH: Always Keccak256(RLP([])) as uncles are meaningless outside of PoW.
DIFF_NOTURN: Block score (difficulty) for blocks containing out-of-turn signatures.
- Suggested 1 since it just needs to be an arbitrary baseline constant.
DIFF_INTURN: Block score (difficulty) for blocks containing in-turn signatures.
- Suggested 2 to show a slight preference over out-of-turn signatures.

We also define the following per-block constants:

BLOCK_NUMBER: Block height in the chain, where the height of the genesis is block 0.
SIGNER_COUNT: Number of authorized signers valid at a particular instance in the chain.
SIGNER_INDEX: Index of the block signer in the sorted list of current authorized signers.
SIGNER_LIMIT: Number of consecutive blocks out of which a signer may only sign one.
- Must be floor(SIGNER_COUNT / 2) + 1 to enforce majority consensus on a chain.

We repurpose the ethash header fields as follows:

beneficiary: Address to propose modifying the list of authorized signers with.
- Should be filled with zeroes normally, modified only while voting.
- Arbitrary values are permitted nonetheless (even meaningless ones such as voting out non signers) to avoid extra complexity in implementations around voting mechanics.
- Must be filled with zeroes on checkpoint (i.e. epoch transition) blocks.
nonce: Signer proposal regarding the account defined by the beneficiary field.
- Should be NONCE_DROP to propose deauthorizing beneficiary as a existing signer.
- Should be NONCE_AUTH to propose authorizing beneficiary as a new signer.
- Must be filled with zeroes on checkpoint (i.e. epoch transition) blocks.
- Must not take up any other value apart from the two above (for now).
extraData: Combined field for signer vanity, checkpointing and signer signatures.
- First EXTRA_VANITY bytes (fixed) may contain arbitrary signer vanity data.
- Last EXTRA_SEAL bytes (fixed) is the signer's signature sealing the header.
- Checkpoint blocks must contain a list of signers (N*20 bytes) in between, omitted otherwise.
- The list of signers in checkpoint block extra-data sections must be sorted in ascending order.
mixHash: Reserved for fork protection logic, similar to the extra-data during the DAO.
- Must be filled with zeroes during normal operation.
ommersHash: Must be UNCLE_HASH as uncles are meaningless outside of PoW.
timestamp: Must be at least the parent timestamp + BLOCK_PERIOD.
difficulty: Contains the standalone score of the block to derive the quality of a chain.
- Must be DIFF_NOTURN if BLOCK_NUMBER % SIGNER_COUNT != SIGNER_INDEX
- Must be DIFF_INTURN if BLOCK_NUMBER % SIGNER_COUNT == SIGNER_INDEX

Authorizing a block

To authorize a block for the network, the signer needs to sign the block's hash containing everything except the signature itself. The means that the hash contains every field of the header (nonce and mixDigest included), and also the extraData with the exception of the 65 byte signature suffix. The fields are hashed in the order of their definition in the yellow paper.

This hash is signed using the standard secp256k1 curve, and the resulting 65 byte signature (R, S, V, where V is 0 or 1) is embedded into the extraData as the trailing 65 byte suffix.

To ensure malicious signers (loss of signing key) cannot wreck havoc in the network, each singer is allowed to sign maximum one out of SIGNER_LIMIT consecutive blocks. The order is not fixed, but in-turn signing weighs more (DIFF_INTURN) than out of turn one (DIFF_NOTURN).

Authorization strategies

As long as signers conform to the above specs, they can authorize and distribute blocks as they see fit. The following suggested strategy will however reduce network traffic and small forks, so it's a suggested feature:

If a signer is allowed to sign a block (is on the authorized list and didn't sign recently).
- Calculate the optimal signing time of the next block (parent + BLOCK_PERIOD).
- If the signer is in-turn, wait for the exact time to arrive, sign and broadcast immediately.
- If the signer is out-of-turn, delay signing by rand(SIGNER_COUNT * 500ms).

This small strategy will ensure that the in-turn signer (who's block weighs more) has a slight advantage to sign and propagate versus the out-of-turn signers. Also the scheme allows a bit of scale with the increase of the number of signers.

Voting on signers

Every epoch transition (genesis block included) acts as a stateless checkpoint, from which capable clients should be able to sync without requiring any previous state. This means epoch headers must not contain votes, all non settled votes are discarded, and tallying starts from scratch.

For all non-epoch transition blocks:

Signers may cast one vote per own block to propose a change to the authorization list.
Only the latest proposal per target beneficiary is kept from a single signer.
Votes are tallied live as the chain progresses (concurrent proposals allowed).
Proposals reaching majority consensus SIGNER_LIMIT come into effect immediately.
Invalid proposals are not to be penalized for client implementation simplicity.

A proposal coming into effect entails discarding all pending votes for that proposal (both for and against) and starting with a clean slate.

Cascading votes

A complex corner case may arise during signer deauthorization. When a previously authorized signer is dropped, the number of signers required to approve a proposal might decrease by one. This might cause one or more pending proposals to reach majority consensus, the execution of which might further cascade into new proposals passing.

Handling this scenario is non obvious when multiple conflicting proposals pass simultaneously (e.g. add a new signer vs. drop an existing one), where the evaluation order might drastically change the outcome of the final authorization list. Since signers may invert their own votes in every block they mint, it's not so obvious which proposal would be "first".

To avoid the pitfalls cascading executions would entail, the Clique proposal explicitly forbids cascading effects. In other words: Only the beneficiary of the current header/vote may be added to/dropped from the authorization list. If that causes other proposals to reach consensus, those will be executed when their respective beneficiaries are "touched" again (given that majority consensus still holds at that point).

Voting strategies

Since the blockchain can have small reorgs, a naive voting mechanism of "cast-and-forget" may not be optimal, since a block containing a singleton vote may not end up on the final chain.

A simplistic but working strategy is to allow users to configure "proposals" on the signers (e.g. "add 0x...", "drop 0x..."). The signing code can then pick a random proposal for every block it signs and inject it. This ensures that multiple concurrent proposals as well as reorgs get eventually noted on the chain.

This list may be expired after a certain number of blocks / epochs, but it's important to realize that "seeing" a proposal pass doesn't mean it won't get reorged, so it should not be immediately dropped when the proposal passes.

Background

Ethereum's first official testnet was Morden. It ran from July 2015 to about November 2016, when due to the accumulated junk and some testnet consensus issues between Geth and Parity, it was finally laid to rest in favor of a testnet reboot.

Ropsten was thus born, clearing out all the junk and starting with a clean slate. This ran well until the end of February 2017, when malicious actors decided to abuse the low PoW and gradually inflate the block gas limits to 9 billion (from the normal 4.7 million), at which point sending in gigantic transactions crippling the entire network. Even before that, attackers attempted multiple extremely long reorgs, causing network splits between different clients, and even different versions.

The root cause of these attacks is that a PoW network is only as secure as the computing capacity placed behind it. Restarting a new testnet from zero wouldn't solve anything, since the attacker can mount the same attack over and over again. The Parity team decided to go with an emergency solution of rolling back a significant number of blocks, and enacting a soft-fork rule that disallows gas limits above a certain threshold.

While this solution may work in the short term:

It's not elegant: Ethereum supposed to have dynamic block limits
It's not portable: other clients need to implement new fork logic themselves
It's not compatible with sync modes: fast and light clients are both out of luck
It's just prolonging the attacks: junk can still be steadily pushed in ad infinitum

Parity's solution although not perfect, is nonetheless workable. I'd like to propose a longer term alternative solution, which is more involved, yet should be simple enough to allow rolling out in a reasonable amount of time.

Standardized proof-of-authority

As reasoned above, proof-of-work cannot work securely in a network with no value. Ethereum has its long term goal of proof-of-stake based on Casper, but that is heavy research so we cannot rely on that any time soon to fix today's problems. One solution however is easy enough to implement, yet effective enough to fix the testnet properly, namely a proof-of-authority scheme.

Note, Parity does have an implementation of PoA, though it seems more complex than needed and without much documentation on the protocol, it's hard to see how it could play along with other clients. I welcome feedback from them on this proposal from their experience.

The main design goals of the PoA protocol described here is that it should be very simple to implement and embed into any existing Ethereum client, while at the same time allow using existing sync technologies (fast, light, warp) without needing client developers to add custom logic to critical software.

Proof-of-authority 101

For those not aware of how PoA works, it's a very simplistic protocol, where instead of miners racing to find a solution to a difficult problem, authorized signers can at any time at their own discretion create new blocks.

The challenges revolve around how to control minting frequency, how to distribute minting load (and opportunity) between the various signers and how to dynamically adapt the list of signers. The next section defines a proposed protocol to handle all these scenarios.

Rinkeby proof-of-authority

There are two approaches to syncing a blockchain in general:

The classical approach is to take the genesis block and crunch through all the transactions one by one. This is tried and proven, but in Ethereum complexity networks quickly turns out to be very costly computationally.
The other is to only download the chain of block headers and verify their validity, after which point an arbitrary recent state may be downloaded from the network and checked against recent headers.

A PoA scheme is based on the idea that blocks may only be minted by trusted signers. As such, every block (or header) that a client sees can be matched against the list of trusted signers. The challenge here is how to maintain a list of authorized signers that can change in time? The obvious answer (store it in an Ethereum contract) is also the wrong answer: fast, light and warp sync don't have access to the state during syncing.

The protocol of maintaining the list of authorized signers must be fully contained in the block headers.

The next obvious idea would be to change the structure of the block headers so it drops the notions of PoW, and introduces new fields to cater for voting mechanisms. This is also the wrong answer: changing such a core data structure in multiple implementations would be a nightmare development, maintenance and security wise.

The protocol of maintaining the list of authorized signers must fit fully into the current data models.

So, according to the above, we can't use the EVM for voting, rather have to resort to headers. And we can't change header fields, rather have to resort to the currently available ones. Not much wiggle room.

Repurposing header fields for signing and voting

The most obvious field that currently is used solely as fun metadata is the 32 byte extra-data section in block headers. Miners usually place their client and version in there, but some fill it with alternative "messages". The protocol would extend this field to with 65 bytes with the purpose of a secp256k1 miner signature. This would allow anyone obtaining a block to verify it against a list of authorized signers. It also makes the miner section in block headers obsolete (since the address can be derived from the signature).

Note, changing the length of a header field is a non invasive operation as all code (such as RLP encoding, hashing) is agnostic to that, so clients wouldn't need custom logic.

The above is enough to validate a chain, but how can we update a dynamic list of signers. The answer is that we can repurpose the newly obsoleted miner field and the PoA obsoleted nonce field to create a voting protocol:

During regular blocks, both of these fields would be set to zero.
If a signer wishes to enact a change to the list of authorized signers, it will:
- Set the miner to the signer it wishes to vote about
- Set the nonce to 0 or 0xff...f to vote in favor of adding or kicking out

Any clients syncing the chain can "tally" up the votes during block processing, and maintain a dynamically changing list of authorized signers by popular vote. ~~The initial set of signers can be given as genesis chain parameters (to avoid the complexity of deploying an "initial voters list" contract in the genesis state).~~

To avoid having an infinite window to tally up votes in, and also to allow periodically flushing stale proposals, we can reuse the concept of an epoch from ethash, where every epoch transition flushes all pending votes. Furthermore, these epoch transitions can also act as stateless checkpoints containing the list of current authorized signers within the header extra-data. This permits clients to sync up based only on a checkpoint hash without having to replay all the voting that was done on the chain up to that point. It also allows the genesis header to fully define the chain, containing the list of initial signers.

Attack vector: Malicious signer

It may happen that a malicious user gets added to the list of signers, or that a signer key/machine is compromised. In such a scenario the protocol needs to be able to defend itself against reorganizations and spamming. The proposed solution is that given a list of N authorized signers, any signer may only mint 1 block out of every K. This ensures that damage is limited, and the remainder of the miners can vote out the malicious user.

Attack vector: Censoring signer

Another interesting attack vector is if a signer (or group of signers) attempts to censor out blocks that vote on removing them from the authorization list. To work around this, we restrict the allowed minting frequency of signers to 1 out of N/2. This ensures that malicious signers need to control at least 51% of signing accounts, at which case it's game over anyway.

Attack vector: Spamming signer

A final small attack vector is that of malicious signers injecting new vote proposals inside every block they mint. Since nodes need to tally up all votes to create the actual list of authorized signers, they need to track all votes through time. Without placing a limit on the vote window, this could grow slowly, yet unbounded. The solution is to place a ~~moving~~ window of W blocks after which votes are considered stale. ~~A sane window might be 1-2 epochs.~~ We'll call this an epoch.

Attack vector: Concurrent blocks

If the number of authorized signers are N, and we allow each signer to mint 1 block out of K, then at any point in time N-K+1 miners are allowed to mint. To avoid these racing for blocks, every signer would add a small random "offset" to the time it releases a new block. This ensures that small forks are rare, but occasionally still happen (as on the main net). If a signer is caught abusing it's authority and causing chaos, it can be voted out.

Notes

Does this suggest we use a censored testnet?

So and so. The proposal suggests that given the malicious nature of certain actors and given the weakness of the PoW scheme in a "monopoly money" network, it is better to have a network with a bit of spam filtering enabled that developers can rely on to test their programs vs. to have a wild wild west chain that dies due to its uselessness.

Why standardize proof-of-authority?

Different clients are better at different scenarios. Go may be awesome in capable server side environments, but CPP may be better suited to run on an RPI Zero. Having a possibility to mix clients in private environments too would be a net win for the ecosystem, as well as being able to participate in a single spamless testnet would be a win for everyone at large.

Doesn't manual voting get messy?

This is an implementation detail, but signers may implement contract based voting strategy leveraging the full capabilities of the EVM, only pushing the results into the headers for average nodes to verify.

Clarifications and feedback

This proposal does not rule out clients running a PoW based testnet side by side, whether Ropsten or a new one based on it. The ideal scenario would be that clients provide a way to attach to both PoW as well as PoA based test networks (https://github.com/ethereum/EIPs/issues/225#issuecomment-284378473).
- The protocol parameters although can be made configurable at client implementers' discression, the Rinkeby network should be as close to the main network as possible. That includes dynamic gas limits, variable block times around 15 seconds, gas prices and such (https://github.com/ethereum/EIPs/issues/225#issuecomment-284380575).
- The scheme requires that at least K signers are online at any time, since that is the minimum number required to ensure "minting" diversity. This means that if more than K drop off, the network stalls. This should be solved by ensuring the signers are high-uptime machines and failing ones should be voted out in a timely fashion before too many failures occur (https://github.com/ethereum/EIPs/issues/225#issuecomment-284413381).
- The proposal does not address "legitimate" spam, as in an attacker validly spending testnet ether to create junk, however without PoW mining, an attacker may not be able to obtain infinite ether to mount the attack in the first place. One possibility would be to have a faucet giving out ether based on GitHub (or whatever else) accounts in a limited fashion (e.g. 10 / day) (https://github.com/ethereum/EIPs/issues/225#issuecomment-284436932).
- A suggestion was made to create checkpoint blocks for every epoch that contains a list of authorized signers at that point in time. This would allow light clients at a later point to say "sync from here" without needing to start from the genesis. This could be added to the extradata field as a prefix before the signature (https://github.com/ethereum/EIPs/issues/225#issuecomment-284725467).

miohtama commented 7 years ago

As the word censored is in cursive, I'd like to point out that while this proposal proposes a new public testnet with less decentralized characteristics, it's possible for anyone to run their own PoW testnet. Then you bear the infrastructure cost of doing so and the proposal does not limit your ability for this any way. This has been true from Ethereum day zero, as Ethereum clients have been very user friendly for running your private testnet.

karalabe commented 7 years ago

Just to add on that, the proposal also does not restrict clients to run this exclusively. The proposal can run side-by-side with the current testnet, so users would be free to chose between the PoW Ropsten or the PoA Rinkeby.

christoph2806 commented 7 years ago

We greatly support this approach! As a DApp Developer, we urgently need a public safe and reliable testnet, which obviously cannot be secured by PoW. DApps are beginning to interact heavily - only to mention status.im, metamask, uport, or other wallets - and only on a broadly accepted public testnet all projects will be present and able to test dependencies to others. For similar reasons, the new testnet should be as similar as possible to the mainnet - only then it can serve as a valid reference for developement. I'd prefer:

similar gas limit
similar block time
similar gas price
and for each parameter, a similar statistical distribution only then you can consider an application which runs on testnet as "tested". I appreciate the parity solution with kovan, because it gives some relief for short term, but I would like to encourage all involved parties to work together on a shared solution.

karalabe commented 7 years ago

@christoph2806 Definitely, added to the proposal's clarification section.

Nashatyrev commented 7 years ago

With time some signers can go offline. Couldn't it be the case when at some block all of (N-K) signers who can mint the next block are stale and the network stuck?

karalabe commented 7 years ago

For my proposal the network operators should ensure that stale signers are removed/replaced in a timely fashion. For testnet purposes this would probably be only a handful of signers that we can guarantee uptime.

hrishikeshio commented 7 years ago

How will the ether be distributed? It is important since spammer can try to get as much ether as possible from various sources and then use it to spam the network.

karalabe commented 7 years ago

@hrishikeshio The issue with Ropsten was that the attacker minted tens of thousands of blocks, producing huge reorgs and pushing the gas limit up to 9B. These two scenarios could be avoided since only signers can mint blocks, so they could also retain some sanity limits.

The proposal does not specify any means for spam filtering for individual transactions as that is a new can of worms. I'll have to think a bit how best to solve that issue (around miner strategies), but limiting ether availability on a testnet is imho a bad idea. We want to be as inclusive as possible.

karalabe commented 7 years ago

One possible solution would be to have a faucet that grants X ether / Y time (e.g 10 / day) but is bound to some OAuth protocol that has proper protection against mass account creation (e.g. github accout, email address, etc).

3esmit commented 7 years ago

Snippet to claim a github user ownership to an ethereum address

contract GitHubOracle is  usingOraclize {
    //constant for oraclize commits callbacks
    uint8 constant CLAIM_USER = 0;
    //temporary storage enumerating oraclize calls
    mapping (bytes32 => uint8) claimType;
    //temporary storage for oraclize user register queries
    mapping (bytes32 => UserClaim) userClaim;
    //permanent storage of sha3(login) of github users
    mapping (bytes32 => address) users;
    //events
    event UserSet(string githubLogin, address account);
    //stores temporary data for oraclize user register request
    struct UserClaim {
        address sender;
        bytes32 githubid;
        string login;
    }

    //register or change a github user ethereum address
    function register(string _github_user, string _gistid)
     payable {
        bytes32 ocid = oraclize_query("URL", strConcat("https://gist.githubusercontent.com/",_github_user,"/",_gistid,"/raw/"));
        claimType[ocid] = CLAIM_USER;
        userClaim[ocid] = UserClaim({sender: msg.sender, githubid: sha3(_github_user), login: _github_user});
    }
  //oraclize response callback
    function __callback(bytes32 _ocid, string _result) {
        if (msg.sender != oraclize_cbAddress()) throw;
        uint8 callback_type = claimType[_ocid];
        if(callback_type==CLAIM_USER){
            if(strCompare(_result,"404: Not Found") != 0){    
                address githubowner = parseAddr(_result);
                if(userClaim[_ocid].sender == githubowner){
                    _register(userClaim[_ocid].githubid,userClaim[_ocid].login,githubowner);
                }
            }
            delete userClaim[_ocid]; //should always be deleted
        }
        delete claimType[_ocid]; //should always be deleted
    }
    function _register(bytes32 githubid, string login, address githubowner) 
     internal {
        users[githubid] = githubowner;
        UserSet(login, githubowner);
    }
}

User create a gist with his public address and call register passing _github_user + _gistid

From https://github.com/ethereans/github-token/blob/master/contracts/GitHubToken.sol

mightypenguin commented 7 years ago

There could be a light quick proof of stake system where (like the github oraclize above) people need 5ETH locked to a live net contract address that then allows them to be on the testnet. Misbehave, and the ethereum foundation (or who ever runs it) confiscates your eth.

karalabe commented 7 years ago

Yeah, side chains are an interesting idea but those are a whole new can of worms :)

maurycyp commented 7 years ago

Two thoughts:

Last week, INFURA launched a (private but publicly available) chain called INFURAnet (with INFURA running all the authorities) to provide a usable test network in the face of the Ropsten issues. It was obviously based on Parity but we would feel better if PoA was a standard and compatible feature across all clients. Therefore, we support this EIP.

Additionally, if Ropsten is replaced with a PoA network, we would be happy to run one of the authorities.

AlexeyAkhunov commented 7 years ago

What about still using PoW on the testnet, but with slightly modified parameters:

1) Block Reward = 0 2) Gas price is fixed to certain value 3) There is a hard cap on the gas limit in a block 4) Faucet gives testnet Ether only to accounts that have Ether in the same account on the main net, and that Ether is at least 24 hours old. Each account only receives test Ether once. Or some other limitation of this sort, which will allow faucet to be automatic, but will limit sybil attacks.

Hopefully, implementation could be much easier than Proof Of Authority

EDIT: Another idea - can Block Reward be negative? Meaning that mining actually cost Test Ether. That allows implementing sort of "Proof Of Authority" trivially, by simply distributing large amounts of test Ether. It also means that if Test Ether is dished out periodically, the maintainers of the test net can disallow abusive miners by not giving them the next tranche of test Ether

karalabe commented 7 years ago

The issue with your modified PoW scheme is that it still permits creating huge reorgs by mining lots of blocks, even if without reward.

The second proposal doesn't solve this issue either as a malicious user might accumulate a lot of ether first, then create many many parallel chains. All will be valid since he does have the funds, and there's no way to take it away. Arguably more stable than the first proposal, but doing negative rewards might break clients unexpectedly as I don't think most codebases catered for this possibility.

Btw, the zero block reward is a nice idea for PoA too, as it prevents a rogue signer / leaked key from ruining the chain with accumulated funds.

AlexeyAkhunov commented 7 years ago

@karalabe Thanks! What I meant with the negative rewards - the maintainer of the network gives out enough Test Eth to current miner authorities to mine, lets say, for a week. After the week, the maintainer looks who needs a top-up, and only gives a top up to miners who behaved well. For those who did not behave well, the payouts simply stop.

AlexeyAkhunov commented 7 years ago

@karalabe Ah, I got your point about the parallel chains now. In that case, there needs to be some kind of regular expiration of Test Eth :)

jaekwon commented 7 years ago

Here's GoEthereum on Tendermint.

https://github.com/tendermint/ethermint

The goal is to make as much of GoEthereum as compatible as possible.

Come to #ethermint on the Tendermint slack for discussions.

We have some upstream patches that would make Ethermint much cleaner. See the bottom of https://github.com/tendermint/ethermint/pull/42/files

jaekwon commented 7 years ago

We're pushing GoEthereum to high tx limits and uncovering some issues.

karalabe commented 7 years ago

Just to mention a proposal by @frozeman and @fjl of adding the set of signers to the extra-data field of every X block to act as a checkpoint. This wouldn't be useful now, but it would permit anyone trivially adding a logic to "sync form H(X)" where H(X) is the hash of a checkpoint block.

The added benefit is that this would allow the genesis block to store the initial set of signers and we wouldn't need extra chain configuration parameters.

holiman commented 7 years ago

Here's a suggested protocol change: https://gist.github.com/holiman/5e021b24a7bfec95c8cc84b97e44e45a

It was a bit too long for fitting in a comment.

karalabe commented 7 years ago

@holiman To react a bit to the proposal here too, I see one problem that's easy-ish to solve, another that's hard:

Your scheme must also ensure that blocks cannot be minted like crazy, otherwise the difficulty becomes irrelevant. This can be done with the same "min 15 seconds apart" guarantee that the original proposal had.

The harder part is that with no guarantee on signer ordering/frequency (only relying on the difficulty for chain quality/validation), malicious signers can mine very long chains that aren't difficult enough to beat the canonical, however the nodes cannot know this before processing them. And since creating these chains is mostly free in a PoA world, malicious signers can keep spamming with little effort.

The original proposal had a guarantee that the majority of the signers agreed at some point that a chain is valid (even if it was reorged afterwards), so minority malicious miners can only feed made up chains of N/2 blocks.

The difficulty idea is elegant btw, just not sure how yet to make use of it :)

keorn commented 7 years ago

If you do not mind somewhat relying on UNIX time and longer block times when validators are down, then Aura (in Parity) uses something like that:

time is divided into steps, the current step is t / step_duration
the primary for step is step % length(validators)
the header seal is a list of two values: step and signature (step is redundant and can be removed in a future version)
the total difficulty or as we refer to it "chain score" is set to be (using appropriate differencing to obtain block difficulty): U128_max * height - step

Validation: block at a given step can be only signed by the primary, only first block for a given step is accepted (if a second is received, a vote to remove the authority should be issued), block can arrive at most 1 step ahead.

Validator set can be altered in the way @karalabe proposed.

Either way we will attempt to implement whichever solution is elected.

karalabe commented 7 years ago

I'm not too fond of relying on time. Using @holiman 's proposal of calculating "your turn" based only on block height seems a bit better in respect as nodes don't have to be synced.

Any particular reason for having the chain difficulty calculated like that instead of just the height of the chain for example? What does this more complex formula gain you?

The issue I see with Aura's turn based scheme is that if a few signers drop off (which can be only natural in an internet scale system), then the chain dynamics would become quite irregular, with "gaps" in the minting time; versus my proposal where multiple signers can fill in for those that dropped.

karalabe commented 7 years ago

If I understand correctly, the idea in the difficulty algorithm is to score those chains higher that have the most signers signing at the correct turn. So chains that skip blocks are scored less vs. those that include all signers.

What happens in scenarios where blocks are minted in step, but propagated later after the step ends? Or if some signers receive the next block in time, while some signers receive it a bit later after the step ended?

karalabe commented 7 years ago

I've updated the proposal with a tech spec section describing the proposed PoA protocol itself. It's still missing a few details around signing (notably the 1-out-of-K block constraint), and I've yet to figure out the difficulty calculation.

Also I split off the PoA protocol from the testnet itself naming wise as I'd like to keep the two concepts separated to avoid confusion. Using metro station names for the testnets is fine, but for a reusable PoA scheme I wanted something a bit more "mundane" and/or obvious.

The names are still up for finalization. The Clique name for the PoA scheme (best until now) was suggested by @holiman .

VoR0220 commented 7 years ago

Id recommend using the Ethermint or Eris DB permissioning native contract or both. They've both been tested extensively and both would not require reinventing the wheel. Furthermore we're all friends here and have done the heavy leg work here so...why not?

karalabe commented 7 years ago

It's hard to evaluate such a proposal without any details. I personally am not familiar with either how the work, so I cannot comment on their feasibility.

My main design goals here are to be easy to add to any client and support current techs (fast, light, warp sync) without invasive changes.

Can those consensus engines be plugged into all clients? Can they run on mobile and embedded devices? Are they fully self contained without external dependencies? Can they achieve consensus header only? Are they compatible licensing wise with all clients? These all are essential requirements I've tried to meet.

I'm happy to consider them, but you need to provide a lot more detail to evaluate based upon.

VoR0220 commented 7 years ago

Absolutely.

So both use a tendermint consensus Proof of Stake, that is detailed here:

https://github.com/tendermint/tendermint/wiki/Byzantine-Consensus-Algorithm

As for the pluggability of the algorithm, it's been proven to be quite doable, in fact, Parity has already done it:

https://github.com/ethcore/parity/blob/ade5a13f5bad745b4200ececde42aa219ad768ae/json/src/spec/engine.rs

And ethermint already implements this through geth in a way (I wouldn't be the one to give the details, that would be something for @jaekwon or @ebuchman to explain)

https://github.com/tendermint/ethermint

As for Eris-DB and your attempt at permissioning by way of Proof Of Authority, we simply utilize the above BFT consensus algorithm and on top of that utilize a native contract (not dissimilar to the current cryptographic addresses such as SHA256, RIPEMD-160, etc.) to implement a permissioning scheme amongst the validators.

While we have our own version of the EVM that is much more stripped down than Geth, I don't think it would be something difficult to make a modular go package for ease of implementation (CC @silasdavis ):

https://github.com/eris-ltd/eris-db/blob/master/manager/eris-mint/evm/snative.go#L73

The above ^^^ could be implemented in a way through geth via some tinkering with this function in geth:

https://github.com/ethereum/go-ethereum/blob/master/core/vm/contracts.go#L33

Both solutions are written in Golang so there is surely a way to make them somewhat compatible. Again. Trying to find a way to work together so ya'll can keep your focus ;)

yaronvel commented 7 years ago

Maybe instead of all these fancy ideas just ask Bitcoin how they are able to have functional PoW testnet? Hint: block size (i.e., gas limit) is bounded.

But off-course we cannot allow testnet to have different behavior than mainnet. So let's us PoA instead. Exactly as in mainnet.

holiman commented 7 years ago

We could have a bounded-limit PoW network as well. Let's have several options.

cdetrio commented 7 years ago

Could the PoA testnet be started from a state snapshot taken from the PoW testnet (perhaps from the Ropsten bounded-gas-limit soft-fork block)? And if the PoA configuration uses the same EIP155 CHAIN_ID=3 as Ropsten, then transactions can be replayed on both the PoA chain and the PoW chain. Replaying transactions on both testnets might be convenient for deploying contracts etc.

karalabe commented 7 years ago

I'm not convinced that's a good idea.

Starting from a huge snapshot would require that all clients implement snapshotting, or at least whatever's needed to load it in the genesis. Geth afaik can do it, but I'm not sure whether the others support it or not.
One of the spam protection feature planned (and AFAIK also present in Kovan) was to try and limit the supply of ether so malicious actors cannot stockpile too large amounts. This would be utterly broken if we loaded up a Ropsten snapshot where I'm assuming our original attacker had a huge pile already (or others for that matter).
Ropsten is probably also dead and I can imagine it would get a reboot too. Still though, relaying transactions between these two networks (irrelevant whether reboot or not) would just be messy since there would be a ton of funds on ropsten for mining, so there would be a constant influx of ropsten transactions that can't be executed anyway.

Imho it's nicer to start with a clean slate.

yaronvel commented 7 years ago

cdetrio does not suggest a snapshot feature (as far as I understand). Just have same network id and replay all ropsten txs until the attack.

I don't understand why everyone keep claiming the amount of ether the attacker had was the problem. IMO it was his (relatively) huge mining power. If gas limit would stayed at 4.7M he couldn't spam as much.

karalabe commented 7 years ago

PoA doesn't have mining rewards and the block miners would be different, the the transactions couldn't be replayed as is, since the accounts wouldn't have the funds.

Noone claimed the ether was the problem. We highlighted that with infinite ether, you can repro the same problem in a PoA network too without much mining power if blocks are not limited.

yaronvel commented 7 years ago

Technically you can make a fresh account with many ethers, and after every ropsten block was mined add a tx that gives the miner the block reward and fees. I am not suggesting to do it. Just wondering if this is what @cdetrio had in mind.

If ether amount is not a problem (given block size is bounded), why do you insist to verify an indentity before giving away ether?

karalabe commented 7 years ago

I personally don't want to place a limit on the block size. Looking at bitcoin, they have huge problems because of that limit. Even though this is a testnet, I'd like to retain the core concepts of Ethereum (yes, I know PoA isn't mainnet, but Ethereum never wanted to settle on PoW anyway, so I see no issue with pushing towards dropping PoW).

yaronvel commented 7 years ago

Do you think it wise not to have PoW testnet at all, while mainnet is still PoW?

Personally, I have an agenda here. I am part of the smartpool.io team. And it will be hard to deploy it on mainnet before we can show people it works on testnet (we have our own private network but it is not the same).

I don't know how many more people need PoW feature. I think metropolis has some changes in uncle mechanism. How can they test it without a PoW testnet?

karalabe commented 7 years ago

It's fine to have a PoW testnet too beside a PoA one to test out forks. We can go down the block limiting route on that.

karalabe commented 7 years ago

Just wanted to ping the thread that I've finished writing up the proposal. We also have a prototype implementation in go-ethereum https://github.com/ethereum/go-ethereum/pull/3753, in the consensus/clique package (I didn't link the commit because occasionally I force push the PR during development).

I'll spend the next few days trying to put together a small beta-test network and also to write up some tests to validate that everything works correctly (mostly around voting and dynamic signer updates).

karalabe commented 7 years ago

@VoR0220 I'm still uncertain whether I understand your two proposals, but I did notice a few things that made me uncertain whether they would be appropriate.

Tendermint seems to rely on a complex cross node interaction to reach consensus on the final block, which inherently means added network complexity. Eris DB seems to be based on a slimmed down EVM, which inherently means that stateless syncs (fast, light) cannot verify the chain. Did I misunderstand something?

All in all though to support my proposal or any of your proposals, clients need to have support for some baseline pluggable consensus engines, so either approach requires work from core devs. I'm not sure about the other proposals, but at least after implementing mine I can guarantee that the PoA and previous PoW can be done without too invasive rewrites, although non trivial to say the truth.

arkpar commented 7 years ago

Here's an alternative idea: Keep the list in the contract for flexibility. The contract emits events when the list changes. Light/fast sync can examine event blooms and transaction receipts and downloads proofs for the changes. The proofs are also added to the warp snapshot.

karalabe commented 7 years ago

The idea is not bad per se, but it blows up the complexity of the proposal significantly:

Light clients don't have access to receipts during sync, so every time the event bloom looks like there's something there, the light client needs to retrieve it. This means that sync code and consensus code all of a sudden get tied together, since sync needs to occasionally pull in extra data. This is quite a large can of worms to open up, especially since there might be much stricter resource constraints on light clients for network traffic, as well as serving nodes may throttle them on large downloads.

The scheme is susceptible to attack that fake "consensus updates" in the log bloom. E.g. I as an attacker can issue a transaction per block that emits some logs which map to the same bloom bits as the consensus contract events. This means that light clients will end up needing to pull in all recepts and a ton of state just to figure out it's a false alarm.

But perhaps most importantly, one of the core requirements of the original proposal was that it should be trivial to embed into other clients. Of course they do need to support some consensus engine pluggability, but based on the code in geth, the entire Clique consensus engine can be done (extremely commented) in 500-750 lines of code, fully self contained into two files. (My pr contains a lot of general cleanup and also reworking ethash in the mean time). The entire proposal depends on implementing a "header check", a "header preparer" and a "sign block" method, which are analogous to those needed by ethash. All else works just as is. Imho this is a very strong benefit that should not be discarded lightly.

arkpar commented 7 years ago

Light clients don't have access to receipts during sync

LES protocol supports GetReceipts message.

The scheme is susceptible to attack that fake "consensus updates" in the log bloom. E.g. I as an attacker can issue a transaction per block that emits some logs which map to the same bloom bits as the consensus contract events. This means that light clients will end up needing to pull in all recepts and a ton of state just to figure out it's a false alarm.

This would require reversing a Keccak hash, wouldn't it?

As for traffic increase, list modifications are expected to be rare enough for it to be negligible.

Does not look much harder to implement to me. Trivial for clients that don't support fast sync or light client protocol. And it does not impose a hard-coded governance scheme.

karalabe commented 7 years ago

LES protocol supports GetReceipts message.

That's a significant overhead to call during syncing.

This would require reversing a Keccak hash, wouldn't it?

The blooms don't use the full hash, only a few bytes from it, so it should be significantly easier to brute force. Given that the consensus contract's address wouldn't change, it shouldn't be too much of an effort to try and break it.

As for traffic increase, list modifications are expected to be rare enough for it to be negligible.

Not if I can attack it.

Trivial for clients that don't support fast sync or light client protocol.

Given that CPP is just working on adding fast sync and I assume light is next for many client implementations, that's just taking a shortcut now that will bite us hard in the long run.

karalabe commented 7 years ago

And it does not impose a hard-coded governance scheme.

That hard coded governance is PoA by majority consensus. I don't see a reason to make it more flexible than this.

fjl commented 7 years ago

I can see both sides for this:

The annoying part of scalable-ish PoA is managing authorised signers. Doing it with a contract is easier because the logic can be shared solidity code and arbitrary new signer management policies can be implemented later.

But implementing it as a contract also adds non-trivial development overhead now because blockchain syncing gets more complicated. @arkpar, I guess you could answer these:

How much work would it be to add clique PoA (as proposed here) to Parity?
How much work would it be to add contract-based PoA (as suggested by you) to Parity?

rphmeier commented 7 years ago

Contract-based PoA is already implemented in Parity, just without conveniences for light clients. Clique/Rinkeby probably wouldn't be a whole lot to implement, but @keorn can answer better.

I would favor a middle ground. A generic validators contract has one method: getValidators() -> [Address]. We can include the signed sha3(getValidators()) as part of the seal for any given block. Light clients can simply fetch fraud proofs when this changes. In the event that some malicious validators don't update the field even when getValidators() would be different, a mandate to follow the longest chain and an honest majority assumption is enough to ensure that the correct chain is synchronized to.

This will work most efficiently with infrequent changes in the validator contract. If they are epoch-based at around once per day, the overhead imposed on light clients synchronizing would not be very high, although there is a stronger availability requirement on the network to continue to store getValidators() state proofs for ancient transitions.

deanstef commented 7 years ago

Hi i've some questions on this:

If a Signer is 'in-turn' he can do whenever he wants, also sign an invalid block (e.g. a block with empty transactions). How the protocol manage this is not clear.
How signers collect transactions? May happens that two signers in different time broadcast blocks with the same transactions? I mean, clients transactions are broadcast to all signers or to just one of them?

Thanks

karalabe commented 7 years ago

A signed block doesn't make it valid. All the yellow paper rules still apply, the signature is just one more requirement. Empty block are not invalid, a signer is free not to include any transactions.
Broadcasting and mining is the same as for all consensus engines. Transactions propagate all over the network, signers aggregate them and include them in blocks when it's their turn (or possibly out of turn too for less difficulty).

ethereum / EIPs

Clique PoA protocol & Rinkeby PoA testnet #225

Changelog:

Clique proof-of-authority consensus protocol

Authorizing a block

Authorization strategies

Voting on signers

Cascading votes

Voting strategies

Background

Standardized proof-of-authority

Proof-of-authority 101

Rinkeby proof-of-authority

Repurposing header fields for signing and voting

Attack vector: Malicious signer

Attack vector: Censoring signer

Attack vector: Spamming signer

Attack vector: Concurrent blocks

Notes

Clarifications and feedback