Open pmconrad opened 5 years ago
Calling this "resigning" isn't a great name, IMO, as it's often just done temporarily to avoid missed blocks while overcoming a shortterm technical problem.
I agree, we should allow any vote collecting entity to signal their desire to be removed from consideration for that role/function. Perhaps the flag can be named something focusing on voting: acceptingVotes, accumulateVotes, isVoteEligible. The flag would be set to true by default when creating a new entity.
Separate issue, but following on what @pmconrad offered above for block producers, I feel the DAC has an expectation for high availability. Therefore, suggest the above flag be set once a BP has missed 1% of time slots in the previous maintenance interval. Calculation for the next round would now ignore the failing BP and the next stand-by would produce. Once the degraded BP restores their operations, they may reset the flag to signal intent to be included in the next maintenance interval vote. This results in at least one maintenance interval of "penalty" for missing 1% of blocks recently.
Design should consider edge cases such as prolonged network halts.
Regarding the auto-setting of such a flag, it has been considered before for Steem witnesses, but I would approach the design of any such feature with extreme caution, as it opens up the chain's block production to new forms of attack. I'm not saying I'm outright opposed to such a capability, but given potential abuses, the design of any should feature should be examined closely for all potential attack vectors it opens up, and such problems should be weighed against the benefits provided by it.
However, calling it "disabled" is also not very descriptive, and smacks of someone other than the witmess / worker / committee member doing the disabling out side of their control. That is a governance issue.
The intent of this feature sounds reasonable to some degree, but if the issue is simply saving missed blocks, it negates the ability to use missed blocks as a metric to evaluate witness performance. Clearly when workers or committee are included it goes beyond that consideration.
I see risks to this, in that if the mechanism is compromised it could shut down, even selectively, various important aspects of the chain, including a total shutdown. I recognize that may not be likely, but this is a powerful feature that shouldn't be implemented in haste, and there is no reason to do so.
@ryanRfox I very much like your idea. 1% translates to 4.8 blocks an hour, with 25 witnesses, 1 hour maintenance, 3 second block time. With 21 witnesses it becomes 5.7 blocks (1200 blocks an hour / # of witnesses).
This sounds very reasonable, however I suggest we make the threshold a committee parameter should it require adjustment. It is easily imagined that the flag reset operation could be included in scripts or bots, which could limit this feature's effectiveness. Also worth considering is automatic compensation for the number of witnesses, such that 1% represents 4.8 or 5.7 missed blocks for 25 vs 21 witnesses respectively.
All in all I like this discussion, as it is the first time in years metrics for witness performance and automatic intervention has been on the table.
I feel additional interventions should also be considered, such as a way to intervene when witnesses fail to upgrade their nodes in a timely fashion. Not saying how that c/should be implemented, but it makes sense to insure the integrity of the witness role and security of production.
Agree @dnotestein I'm suggest we table further discussion on that topic to a distinct Issue.
Also, a witness who has missed all his blocks between two maintenance intervals should be disabled automatically.
Why inventing the wheal again? Do it exactly like steem. Two maintenance intervals is too short of a time period. It could launch a couple of attack vectors ....
but I would approach the design of any such feature with extreme caution
Absolutely! I think a small number of colluding witnesses could cause irregular block misses for a given victim witness, so a small percentage as suggested by Ryan would indeed be dangerous.
Two maintenance intervals is too short of a time period. It could launch a couple of attack vectors ....
Please elaborate.
IMO a witness who has been missing blocks for an hour is unlikely to produce in the next hour, so for the health of the chain it would be beneficial to remove him. An important part of the proposal is that the setting is reversible, i. e. as soon as the witness has fixed his server he can re-enable his witness and start producing after being voted in again in the next maintenance block.
Automatic disabling must take extreme situations (chain halt) into account. E. g. witnesses are not disabled if LIB is before previous maintenance interval.
Technically it's doable for witnesses, because there is something to check, so we know whether a witness is active.
For committee members and workers, there is nothing to check, but totally relies on the creators' action. People may create a committee member / worker then forget it, "vote with feet". Although this, IMHO it's no harm to have such a feature.
IMO a witness who has been missing blocks for an hour is unlikely to produce in the next hour, so for the health of the chain it would be beneficial to remove him.
That's ok! The danger is for potential malicious witnesses to get activated too fast before the community takes notice and has the opportunity to act/defend. So it's much "healthier" for the chain to have lower performance for a couple more hours BUT raising exponential the security!
Maybe a good idea would be to allow a fast witness "swap" (2 voting intervals like mentioned) for only 1 witness swap per 24 hours and if another incident comes before the 24 hours then increase MUCH more the waiting time for the next swap ! ;)
However, calling it "disabled" is also not very descriptive, and smacks of someone other than the witmess / worker / committee member doing the disabling out side of their control.
Maybe calling it: Witness "paused"
I like "paused" better than disabled, if only one word is used. If 2 words, "temporarily disabled" (rather long) or "resigned / paused". I dislike the single word "disabled" for this action. It sounds too final, and too authoritarian / controlled. Resigned alone is not too bad either. I know Wackou would use this and probably many others. The nice thing about this proposal / idea is the witness can decide later to reactivate themselves.
What about Activate /Deactivate?
Regarding auto deactivated witness node, I suggest at least 12 hours after witnesses miss blocks. After 8 hours good sleep, they have 4 hours time to recover witness node.
One of the huge impact if we rotate witnesses every 2 hours is price feed. The inactive witness price feed will stay there for 24 hours until expire. If we have 4 witnesses missing blocks and those backup witnesses are ready to provide price feed but not produce blocks, then we will have domino effect after 12 hours, there will be 24 inactive witnesses providing outdated price feed vs 23 active witnesses providing updated price feed.
For committee members and workers, there is nothing to check, but totally relies on the creators' action.
Yes. This is supposed to be a tool for responsible community members. Irresponsible member will hopefully be voted out in the normal way.
The danger is for potential malicious witnesses to get activated too fast
That would mean the potential malicious witness would have sufficient votes to be the next in line, just at the time when an existing honest witness stops producing. I think that's unlikely. And even if a malicious witness gets in - what can he do?
Whenever someone new appears and offers to run a witness, the community doesn't know if he is malicious or not. The only way to find new good witnesses is to accept new witnesses, find out if they are good or bad, and fire the bad ones. Meaning we are well prepared to deal with a malicious minority anyway. This proposal does not change that.
allow a fast witness "swap" (2 voting intervals like mentioned) for only 1 witness swap per 24 hours
Unlike STEEM we have a maintenance interval to perform maintenance tasks, and updating the list of active witnesses is maintenance. Any additional rules are a violation of the KISS principle, and therefore need a good reason. I haven't seen one yet.
The question how many witnesses can be changed in a single maintenance interval will have to be addressed in the context of https://github.com/bitshares/bitshares-core/issues/1369 and https://github.com/bitshares/bitshares-core/issues/1444 anyway. IMO it is out of scope here.
One of the huge impact if we rotate witnesses every 2 hours is price feed.
The same applies to normal witness voting. IMO automatic pausing (or whatever you want to call it) of witnesses does not change that. Quite the opposite - it improves the situation by replacing an inactive witness with an (hopefully) active one. A single outdated feed price will not have a significant effect on the median.
The extreme situation that you describe is extremely unlikely, IMO. Every witness should have a backup node in active standby, best with automatic failover. If so many witnesses drop out at once we probably have bigger problems than the feed price.
That would mean the potential malicious witness would have sufficient votes to be the next in line, just at the time when an existing honest witness stops producing. I think that's unlikely. And even if a malicious witness gets in - what can he do?
For one witness I can't think of any problem.
But what happens if for example a bad actor finds IP addresses of 6+ witnesses and ddos attacks them simultaneously so they would all together miss blocks and replaced by 6 malicious witnesses in a short time frame? A little bit of chaos would be the minimum accomplished... That's why I suggested to increase the time, after the first (or two) swap of witness, for the next swaps to happen....
Right now, an attacker who can DDOS 6 of 21 witnesses will cause a chain halt after some time because LIB will no longer advance. This proposal improves on that in that the DDOS'd witnesses will be replaced automatically until they can remedy the situation.
Even if the replacement witnesses are malicious, they can't do very much. Maybe prevent some transactions from being included in the chain, if there's enough of them. Maybe skew the price feeds a little.
It may be debatable how likely such an attack is, and if this proposal is an improvement over the current situation or not. I think it is very unlikely, therefore I would prefer this simple-but-not-quite-perfect proposal over a more complicated one, keeping in mind that the number of witnesses that can be replaced/removed/added per round will probably be restricted at some point in the future anyway.
From witness channel on telegram:
Sometimes, witnesses or committee members suddenly get voted in who have previously declared that they step down / resign from their posts. In the case of witnesses this leads to instability of the chain, in the case of committee members it can make it difficult to enact important governance decisions. It should therefore be possible for witnesses and committee members to flag their resignment on-chain, so that they are ignored during voting. Also, a witness who has missed all his blocks between two maintenance intervals should be disabled automatically.
A comparable mechanism for witness resignment already exists on Steem on other Steem-based chains.
Similarly, it should be possible to resign workers. A typical use-case might be that the funding goal has been reached prematurely. Resigning such a worker would free up the budget for other projects.
The resignment must be made reversible.