convergence fault in the presence of non-segwit signaling blocks

gmaxwell commented 7 years ago

Alice, Bob, Carol, Dave, etc. are 'btc1' miners and have begun enforcing segwit (bit1) signaling.

These miners are not connected directly to each other but only via the network of ordinary Bitcoin nodes, as is likely.

Mallory is an "unlimited" running miner (or at least pretending to be) and not signaling segwit.

Mallory creates a non-BIT1 signaling block.

A, B, C, D... all ignore her block because it doesn't signal.

Alice mines a block. But B, C, D, etc. won't hear it because it isn't the longest chain to the rest of the network.

Bob mines a block but A, C, D, etc. won't hear it... and so on.

Meanwhile Mallory and the other non-btc1 miners keep on mining more blocks, all collaborating on a single chain.

So the network breaks into as many forks as there are partitioned 'btc1' miners until a single btc1 miner overtakes all the non-btc1 miners combined, which may not even be possible.

Resolution is to either guarantee there are non-bit1 signaling miners before enforcing (kind of incompatible with the definition of 80% activation), or manually guarantee the the graph of enforcing miners is connected.

This requires an urgent response.

jameshilliard commented 7 years ago

Miners should be ok as long as they are connected to the relay network.

Per @TheBlueMatt:

Hey y'all,

Please keep in mind that neither James' segsignal nor Jeff's btc1 have
any logic to ensure they connect to other BIP 91 nodes. Because such
nodes are very few, and because other nodes will *not* relay blocks
which are not their best chain, it is very possible that your nodes will
not receive notifications of new blocks on the BIP 91 chain.

This means that any time a miner mines a block which does meet the BIP
91 requirements, the fork block which does will take a longer amount of
time to reach your nodes, if you see it at all!

Please, please make absolutely sure that your BIP 91-enforcing nodes
have connections (via -addnode=$NODE or addnode $NODE add) too other BIP
91 miners.

Additionally, the public FIBRE nodes will be adapted later today to
relay all blocks which they receive which are recent. This should be
sufficient to solve the issue as well.

In conclusion, MAKE SURE you are connected to the public FIBRE network
as well as other miners BEFORE the BIP 91 activation takes place. Keep
in mind that bitcoind is limited to 8 addnodes (this limit is separate
from the outbound connections limit).

Thanks and good luck,
Matt

TheBlueMatt commented 7 years ago

Well there's gonna be some changes needed to the relay network to make it work, and ideally you wouldnt trust my rather ad-hoc network to stay online, but, yes, hopefully. Still, it is strongly, strongly recommended that non-miners do not run btc1/bip 91 until well after the bip 91 fork.

JaredR26 commented 7 years ago

This seems like an issue with BIP 148/91's approach, correct? The only reason signaling rejection was added in was by request for compatibility with those, bit4 required signalling was never merged(I think).

as there are partitioned 'btc1' miners until a single btc1 miner overtakes all the non-btc1 miners combined

In reality most of the big pools are direct peers and/or the fast backbone network.

That said, this does appear to be a legitimate concern, thank you for raising it Greg.

It seems that manually ensuring direct peering is the best resolution of this?

Is there a code approach that should have been taken instead (for future documentation/proposals if nothing else)?

Jared

TheBlueMatt commented 7 years ago

This isnt really an issue with BIP148, cause they have lots of nodes and good connectivity, this is, however, a major issue for btc1.

In reality most of the big pools are direct peers and/or the fast backbone network.

Not really true anymore, sadly. This absolutely needs to change prior to activation.

jameshilliard commented 7 years ago

This seems like an issue with BIP 148/91's approach, correct?

It's more of an issue just because there aren't many BIP91 nodes, but it should be manageable as long as pool operators make sure they are connected properly to other pools.

jgarzik commented 7 years ago

Agree it's an issue. The new DNS seeds do inject a heavier dose of 2x nodes; whether or not that has a positive impact depends on your situation (a weak help, IOW).

TheBlueMatt commented 7 years ago

@jameshilliard yes, this is why it is very strongly recommended that only miners (who should all have careful setups) run bip91/btc1 until well after the bip 91 fork!

@jgarzik there is code which makes sure that dnsseeds are not queried almost at all (except very first startup), so, no, that isnt just a weak help, its no help.

JaredR26 commented 7 years ago

Taking that statement to be true, another solution would be to add many more btc1 nodes, correct?

(In parallel with the manual peering changes)

TheBlueMatt commented 7 years ago

Not really, because the btc1 nodes dont contain the 14.2 fixes, so peer-finding is further broken (seriously, why?) and because you need time to propagate information about those nodes through the network.

gmaxwell commented 7 years ago

The new DNS seeds do inject a heavier dose of 2x nodes;

What seeds add to your pool doesn't really do anything there because the ordinary seeds aren't filtering them out. They'll be no more likely to get selected than any other node. A quick test with 5 restarts shows every try resulted in node fully partitioned.

BIP 148 doesn't appear to have this issue, and would actually heal this problem if the activation were concurrent. However it looks at least somewhat likely that bit4 will activate well ahead of BIP148, and if so they won't be a help.

Aside, your hardcoded seeds returning a small set of static results is a gross violation of the DNS seed policy copied over from the Bitcoin project.

earonesty commented 7 years ago

If 0.14.2 were merged in, it would allow the btc1 nodes to self-select a connected network, correct?

TheBlueMatt commented 7 years ago

@earonesty No, that would help a small amount, but its much too late to get the nodes to be well-propagated at this point.

earonesty commented 7 years ago

Well, maybe NYA miners should just run the UASF code itself. Seems like a safer way of activating. Then they can switch to the btc1 branch for the hard fork later.

jgarzik commented 7 years ago

Please keep the discussion technical and focused.

jheathco commented 7 years ago

Just a thought, but with regards to segwit2x BIP91 lock-in, couldn't the code be modified to time the orphaning of blocks to that of BIP148's activation rather than using block X+336?

This would at least allow segwit2x to team up with the BIP148 nodes in orphaning any blocks not compliant with BIP141.

TheBlueMatt commented 7 years ago

@jheathco That seems like a good idea, but it is way too late to be changing consensus rules post-release. This is precisely why you dont rush things like this, you end up regretting your decisions (see also P2SH, etc).

kek-coin commented 7 years ago

@jheathco how would that work in case BIP91 locks in after the BIP148 deadline?

JaredR26 commented 7 years ago

On that note, is there a reason that the Bitcoin Core 14.2 changes can't be brought forward in a post August 1st merge? I might be able to work towards that when I am home from vacation if there is support for it among the segwit2x team.

jheathco commented 7 years ago

@kek-coin I'd imagine you could simply default back to block X+336 in the case that lock-in happens after BIP148 activates.

jgarzik commented 7 years ago

@JaredR26 Yep - it was on the mental todo list - filed as #88

tomasvdw commented 7 years ago

This is trivially solved by some two or three big miners making sure they are connecting to each other.

In fact, this is already solved by the largest SegWit2x miner having more hashpower than all non-segwit2x miners combined.

Or as Gregory writes:

So the network breaks into as many forks as there are partitioned 'btc1' miners until a single btc1 miner overtakes all the non-btc1 miners combined, which may not even be possible.

Of course this is possible. The opposite is hardly possible given the current hashrate stats.

nukebloodaxe commented 7 years ago

@tomasvdw Although technically right in a way, it is of cold comfort to the miners who find themselves building on the wrong partitioned chain. I can't imagine the issues it'll cause for early end users/exchanges as well, because of sly double spending by those utilising each partitioned chain.

Really a list of primary nodes interconnected with each other needs to be made now, and then for each mining team to manually connect to at least one of them; I know this has been indicated earlier, but it really needs emphasis.

marthinus-engelbrecht commented 7 years ago

Noob question. How serious is this? Are we talking a potential accidental fork.

tomasvdw commented 7 years ago

We have to remain realistic. Currently the largest btc1 miner has 22%. I am trying to imagine a situation where

A. No btc1 route exists between this miner and another btc1 miner. B. No connected set of btc1 miners exists with more then 22% C. More than 22% of the (non-segwit) miners decide after BIP91 lock in to continue mining $25k blocks that they know will be orphaned.

Yes. It is a good idea to ping some miners to manually connect, but the scenario is not realistic.

Lejitz commented 7 years ago

All of this discussion is centered around the idea that every miner purportedly running btc1 will actually enforce and adhere to the bit 1 mandate after activation. But with false signaling, there could be quite a few that do not (even accidentally). Imagine the havoc that could wreak. Many nodes would follow the false signalers that mine blocks not signaling on bit 1, leaving the honest miners with little/no network support. I'm not sure that even a list of nodes would help too much in this scenario.

All of these problems would be fixed (or most alleviated) if the code were slightly modified to make activation coincide with BIP148 activation if BIP91 lock-in occurs before that time. If lock-in does not occur before that date, it could simply follow the X+336 schedule.

jgarzik commented 7 years ago

@JavierGonzalez btc1 is not the forum for that type of comment. We are trying to build a more civil and professional forum, and that does not befit the welcome sign graphic at the door.

kek-coin commented 7 years ago

@JavierGonzalez peering issues can be solved without hashrate.

jgarzik commented 7 years ago

~~This issue needs a chill-out period :)~~ ETA, unlocked.

Agree w/ @tomasvdw that this can be - and is - being solved with good peering.

And if I may put words in @kek-coin 's mouth, btc1 peering issues should be attended to irrespective of hashrate. We can solve related items already on the board (e.g. #86 ) along the way.

starrymoonlight commented 7 years ago

BIP 148 doesn't appear to have this issue, and would actually heal this problem if the activation were concurrent. However it looks at least somewhat likely that bit4 will activate well ahead of BIP148, and if so they won't be a help.

Given the dangers of BIP91, if a significant group of miners could be convinced, wouldn't it be safer for miners to use BIP148 for activation? If so will btc1 consider merging in BIP148?

kek-coin commented 7 years ago

An idea I've seen floating in this thread would be to change the orphaning precondition such that if BIP91 activates before August 1st, BIP148-nodes will help out to alleviate the partitioning danger, while with a post-August 1st activation it would be on its own. This could be easily implemented by changing the check from "BIP91 ACTIVE" to "BIP91 ACTIVE && BIP148 ACTIVE".

Changing consensus rules on such short notice may be undesirable but since there has not yet been an official release and since BIP91 is (imho) primarily relevant for miners - and therefore users can simply keep running Core (or BIP148) while it is being sorted out - this could be a viable mitigation strategy.

istrau2 commented 7 years ago

@kek-coin That sounds like a great solution.

JaredR26 commented 7 years ago

@kek-coin https://github.com/kek-coin That sounds like a great solution.

UASF is not a solution to BIP91.

On Jul 19, 2017 7:23 AM, "istrau2" notifications@github.com wrote:

@kek-coin https://github.com/kek-coin That sounds like a great solution.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/btc1/bitcoin/issues/85#issuecomment-316354976, or mute the thread https://github.com/notifications/unsubscribe-auth/ALlH8ick3fs2N36JwraPdTx8ZkwSTcE3ks5sPecigaJpZM4ObxaP .

kek-coin commented 7 years ago

@JaredR26 please re-read my suggestion.

JaredR26 commented 7 years ago

Oh, my bad, I had a different comment in my head when I wrote that from higher up. Sorry

It isn't a terrible idea, but I think (echoing what someone on the email list wrote) that trying to change course now might make the issue worse if any miners don't update twice. If they peer as specified the issue will be mitigated. If we had more time that might be a better approach.

istrau2 commented 7 years ago

@JaredR26 I see where you are coming from (changing the rules at this point is indeed risky), however I disagree. I think the change suggested by @kek-coin is quite minor and greatly increases the chance that the SF will go smoothly. Trying to fix this issue using some kind of peering hack is riskier in my opinion (and also significantly increases the burden on the miners in upgrading).

Additionally, (while I may be misunderstanding something here), It seems to me that we are forgetting that we also want small miners to upgrade to BIP91 compatible software (at least eventually). Its all great that we can hack together the large miners and avoid serious network issues, however these smaller miners are much more likely to be isolated (and therefore experience orphaning issues). Small miners will be very hesitant to upgrade to BIP91 (as well they should be) and this will lengthen the time period in which this issue exists as a real risk after the SF.

kek-coin commented 7 years ago

@JaredR26 to circle back on your psychological approach to bitcoin consensus rules; don't you think that miners and users joining forces to avoid a chainsplit might make for a mighty fine symbolic event?

istrau2 commented 7 years ago

Actually, thinking about this more. It seems to me that the users are actually the biggest risk here. In other words if Tom is a user that is not directly connected to any BIP91 node, then even if there is a 80/20 split between BIP91 and non BIP91, isn't there a decent chance that Tom could possibly get 3 confirmations to a block that is on the non-BIP91 chain?

I mean, the non-BIP91 nodes would keep building on top of the BIP91 blocks. All that is required is for the non-BIP91 nodes to build 3 blocks (with their 20% hashrate) before the BIP91 nodes can build 3 blocks AT ANY POINT IN TIME after the SF. Then, Tom, would see 3 confirmations to a transaction that would get wiped out pretty soon afterwards.

Am I thinking about this incorrectly?

kek-coin commented 7 years ago

Users and (small) miners not directly connected to the biggest BIP91 partition are at risk, yes.

istrau2 commented 7 years ago

Additionally, I am not sure that preferential peering would be a good solution. It seems like that could open BIP91 up to malicious fake signalling attacks.

christophebiocca commented 7 years ago

then even if there is a 80/20 split between BIP91 and non BIP91, isn't there a decent chance that Tom could possibly get 3 confirmations to a block that is on the non-BIP91 chain?

This follows the same distribution as someone trying to do a 51% attach with minority hashpower, which is in the whitepaper.

So for 20% of hashpower there's a ~20% chance that after mining one block, the non-signalling and/or non-orphaning miners manage to extend their lead by 2 more blocks before the network catches up to them.

With the more realistic 12% of non-segwit2x miners, you're looking at a 7% chance of such a branch, per time the 12% mines a non-segwit block on top of the longest chain (which, since GBminers and ~1/2 of slush signal anyways, is down to KanoCK pool and a few other, totalling ~5%).

But while there's a risk for users, it's worth pointing out that each such block mined, and any block mined on top will be completely orphaned eventually.

That's a lost reward of $30k each time.

So the complete list of who's at risk is:

Non-signalling miners: $30k for each block mined.
Signalling-non-orphaning miners: $30k each time they mine a block on top of a non-signalling block, which will occur about 5% of the time.
Non-peered signalling-and-orphaning miners: $30k for each block they mine trying to orphan an invalid chain, but that they don't get relayed to the 80% group.
Users not enforcing BIP91: Possibility of accepting payments from a branch that will be orphaned, which will happen 5% of the time for 1-confirmation, ~1.25% of the time for 2 confirmations, and ~0.4% of the time for 3 confirmations.
Users enforcing segwit2x but not peered to the main group: Will not see valid blocks until the segwit2x chain overtakes the non-segwit2x one, but will not accept the invalid ones. UX degradation.
Peered + signalling + orphaning miners: Business as usual.

As usual, knowing and enforcing the soft-fork rules is safer than being downgraded to SPV security, as long as a majority of hashpower is cooperating to enforce such rules.

Not clear to me why we'd want to discourage end users from running segwit2x nodes given the above. If anything we should be doing the opposite.

istrau2 commented 7 years ago

@christophebiocca I don't think your math is correct. The following seems incorrect to me:

This follows the same distribution as someone trying to do a 51% attach with minority hashpower

Aren't you forgetting about how connectivity factors into the equation here? I mean, if a non-BIP91 block is mined, all the BIP91 nodes ignore it and keep mining on top of the previous block. However, the non-BIP91 nodes do not ignore it and begin mining on top of the new block. Then, the non-BIP91 nodes are given what could be a significant head start, aren't they?

christophebiocca commented 7 years ago

They have a one block head start which is why I used the "odds of making 2 blocks" instead of the "odds of making 3 blocks" from the calculations. Because yes, the non-orphaning nodes have a 1-block head start each time. But that has nothing to do with connectivity, just has to do with the fact that it's the majority trying to orphan the minority, instead of the other way around.

This already accounts for the snowball effect. I encourage you to read section 11. of the whitepaper which explains the calculation in detail. I may have messed up the adaptation to this scenario. Specifically, the chance above is to get to a lead of 3 before being tied, but being tied does not immediately kill the chain, so the percentages might need to be scaled by 1 + q to be truly accurate.

istrau2 commented 7 years ago

@christophebiocca

Yes, I do see what you are saying (regarding the headstart not being related to connectivity). Still, related to the headstart, it would seem that connectivity does still matter because it determines the extent of the headstart. I mean, given the issue at hand, it is only the hashrate of the largest connected network of BIP91 nodes that matters. In other words the numbers you mention are an ideal (if all BIP91 nodes are connected).

christophebiocca commented 7 years ago

I had assumed your choice of 20% instead of the more realistic 12% was an attempt to account for a partitioned network of segwit2x-enforcing miners leading to a lower hashrate.

20/80 is 12/48 which less than 4 biggest pools put together. It's already a pretty big assumption that we wouldn't get any other pool to addnode one of the big 4.

0rmi commented 7 years ago

@istrau2 Correct me if I'm wrong, but I believe that BIP91 miners are going to accept non-Segwit2x blocks as long as they signal "regular" segwit on bit1? So only miners with no signaling for segwit(of any kind) would have their blocks ignored, like BU miners for example.

And such miners should be a miniscule fraction of the total hashpower and the likelihood of them mining any meaningful amount of blocks on top of each very small, thus not making this a particularly scary issue?

ATBP commented 7 years ago

Much more SegWit2x nodes should be connected to the network before miners even start signalling. Miners should be the last ones who upgrade to btc1 client, as correctly described inside SegWit2x Calendar at https://segwit2x.github.io/

July 14 - Agreement Participants Install and Test Milestone July 21 - Nodes Running & Signaling begins

istrau2 commented 7 years ago

@christophebiocca

Ah, I see. Ok, makes sense. What do you think is the best solution to this issue?

kek-coin commented 7 years ago

@0rmi BIP91 enforces mandatory bit-1 signalling. This means that non-S2X miners will not be orphaned if they signal for OG-Segwit, while certain S2X miners will be orphaned if they don't start signalling bit-1 in time (ie. anyone with the 0x20000010 version; e.g. BTC.TOP, Antpool, BATPOOL, BTC.com, VIABTC are risking self-orphanage at this point).

christophebiocca commented 7 years ago

S2X miners will be orphaned if they don't start signalling bit-1 in time (ie. anyone with the 0x20000010 version; e.g. BTC.TOP, Antpool, BATPOOL, BTC.com, VIABTC are risking self-orphanage at this point).

There's a lock-in period of ~3 days between the 80% threshold and the start of orphaning. They're not risking anything at this point unless they you think they'll all stop paying attention and go on a vacation for a few days right before the threshold is reached..

kek-coin commented 7 years ago

I didn't mention F2Pool did I?

btc1 / bitcoin

convergence fault in the presence of non-segwit signaling blocks #85