Joystream / joystream

Joystream Monorepo
http://www.joystream.org
GNU General Public License v3.0
1.42k stars 115 forks source link

Draft: Validation Diagnostics #4510

Open bedeho opened 1 year ago

bedeho commented 1 year ago

Problem

The health of the validation ecosystem is very important. Things can keep working while things are deteriorating. Unless the cost of obtaining this information is very cheap and reliable, then people will not be able to act in time to reallocate stake or recover from impending failure modes. The failure modes we care about are lack of liveness or consistency, possibly as a result of actions by a byzantine adversary. Also having validator metadata identity metadata available on-chain would be very useful

Solution

Query Node

Add support to the QN for rich historical information associated with validators. When this information lives in the QN, it is easy to access for apps like Pioneer, CLI and also for alert-based services that different actors may want to run to detect bad scenarios with low latency. Information we are interested in would be

Question: here we really need to think about what nominators want to know when picking validators

CLI

Add support to the CLI for listing key information about the state of the chain and validation. This information should either be printable to screen, or dumpable to an output file as a structured format that can be easily programmatically parsed, so e.g. JSON.

The information that immediately comes to mind is:

Chain

Most of this information should be available in the state, but adding in some extra historical information, e.g. metadata

Feedback

I am in particular looking for feedback on

bwhm commented 1 year ago

This looks very closely related to what I had in mind as well!

All the stats/data can be fetched without query node support, but for a few reasons, it's not ideal. One being the staking.historyDepth, which denotes the amount of eras most of the relevant information can be fetched without having to query the state at some past block. Although 120 eras is quite a lot - at 30 days assuming "perfect" block production - it's still pretty valuable to keep it in the QN for faster and easier access.

Off the top of my head, here is what the QN should store:

In addition, keep track of all bonded stash account, and keep a record of activity.

I'm sure there are a few other things that could be added and that some of it could be scrapped, but would be pretty complete...


On the CLI side:

  • Most recent finalised block by block height, including key information like the congestion variable, time, size, hash, height, tx count, etc.
  • Most recent unfinalized block, same information.

I'd say this is a single command like chainData:blockInfo, which takes one of the below input(s):

  • Presence of chain splits, i.e. finalisation of blocks on either side of a fork.

This might require reading from the polkadot telemetry server to be complete I think, but I think you can get this from the nodes your endpoint is connected to.

Validation

  • Number of validators in the validation set, with key information associated with each, like amount of stake backing, number of nominators, commission, when they last checked in, unpaid reward, impending slashes, and whatever else would be useful.

Except the bolded one - which I either don't understand or is impossible, that is the important. It can be implemented already, but will be slow without QN support (especially for eras older than 1 month).

  • Current overall rate of staking.

...and expected rewards based on this

  • Impending slashes.

Yeah, offences and slashing spans as well.