cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.27k stars 3.63k forks source link

[Feature]: Provide a way to query validators' jails and their reasoning #20188

Open freak12techno opened 6 months ago

freak12techno commented 6 months ago

Summary

Add a query or something similar to return validators' jails and why did they happen.

Problem Definition

Currently there's no way to see the occurrences of validators' jails and why did they happen. There's a query to get the validators' slashes from block X to block Y, but there are some jail cases when the slashing doesn't occur.

With ICS introduction, it's getting more complex, and I've personally seen and helped a few validators who got jailed for downtime on consumer chains and these jails were completely unexpected and unclear for them.

So for now I know at least 5 cases why a validator can get jailed (unbonding all of the self-stake; downtime on a provider chain; downtime on a consumer chain; double sign on a provider chain; double sign on a consumer chain), and that's not including some custom jails that some chains introduce (like on Kujira, a validator is jailed if it doesn't run a price-feeder).

I think if there's an endpoint that will return the validators' jails (ideally throughout the whole time, so they won't get pruned, like votes, and would be available even on a pruned node), with the height the validator was jailed at, jail time/whether it's permanent, maybe slashing percent, jail reason, and some metadata (like if it's ICS downtime jail, it may include which chain a validator have missed enough blocks at; ideally extendable, so if a chain has a custom way of jailing a validator, they can put some metadata explaining the jail).

This should benefit both the validators (for example, with my example above; if they get jailed they can easily see the reason why) and the chain users in general (such a query can be highly benefitial for explorers, by showing a validators' jails history, which can lead to delegators seeing more info about their validators and being able to do their due diligence better).

Proposed Feature

  1. Have an endpoint/gRPC query that return data in a format similar to this::
[
    // Example is for hypothetical jail for ICS consumer chain downtime
    {
        "validator_address": "cosmosvaloper1xxxxx", // Validator address, or consumer address
        "height": 123, // Jail height
        "slashing_percent": 0.00, // How many % of the stake was slashed
        "jail_time": "600s", // How much is it till a validator can unjail
        "permanently_jailed": false, // Whether it's a double sign
        "reason": "ICS_CONSUMER_DOWNTIME", // Some enum value, or store it in a metadata
        "metadata": { // Some metadata that can be implemented differently per each reason
            "consumer_chain_id": "neutron-1"
        }
    }
]
  1. When jailing a validator for any reason, add a new entry into the store with this data.
jtremback commented 6 months ago

This is a very good idea. Needs some work in the Cosmos-SDK to make it happen. @tac0turtle what's your take?

freak12techno commented 6 months ago

I guess this is how it might be done (consider this comment a mini-ADR lol):

1) add a new struct representing jail info (with the fields like in the example I provided in the issue) 2) add a new collections.Map to staking keeper to store these structs 3) change the signature of https://github.com/cosmos/cosmos-sdk/blob/main/x/staking/keeper/slash.go#L211 to accept this struct as a 3rd argument and save it to store after jailing a validator 4) change the usage of this method in all places by providing jail info to it 5) add a gRPC/REST query to return the jails info per validator (either by a valcons or a valoper addr, not sure) and probably another one to return all jails info for all validators

Unlikely that we can generate jail infos for previous jails, so I guess we can only track new jails. Also we'll probably need to consider the cases when a validator rotates its keys (probably by 1) taking a validator from chain, 2) taking all of its keys from the store and 3) searching for jail infos for all of these).

Feel free to assign me to do this btw, sounds like a challenge lol.

tac0turtle commented 6 months ago

this goes into a larger conversation around state management. State machine vs client. I think we should delete all information within modules that is not needed for the state machine. Then we have a side car process that indexes things related to querying. It would be like the tx_index.db of comet but have lots more information. Application developers could have a state machine schema and a query/client schema which would translate the state machine data + events into the schema needed for queries. The more information we add to the node directly and consensus the slower the overall state machine becomes, even with small amounts of information.

We have been discussing this internally and are thinking through different designs. Once we have agreement we will begin to build something out.

freak12techno commented 6 months ago

@tac0turtle is there an issue for discussing this sidecar approach? I actually agree that this jail info isn't something that's required by the node directly and it'd make extra bloat for the app itself, yet it seems like indexing/storing it (in a sidecar process) being able to query for it would benefit those developing apps on Cosmos and validators.

tac0turtle commented 6 months ago

We dont store this information on chain as it was meant to be indexed via events or some other service. If we were to add it nodes would not have any legacy information. With the side car process we would be able to go back in time and extract the information.

ill update this issue with the current discussions https://github.com/cosmos/cosmos-sdk/issues/18000

vshybalov commented 6 months ago

This is very good idea, as our validator was constantly getting jailed recently without clear reason, only discussion in Discord community helped us to address the issue.

May be there is also good idea to add more info in the node logs, right now it just writes:

INF validator jailed module=x/staking validator=*****

But there is no any reason or additional info.

tac0turtle commented 6 months ago

there is definitely more information on why. Have you looked at the events?

vshybalov commented 6 months ago

I agree, there are some info in the events, but it is hard to create relation sometimes between event message as a jail reason. At least for the less experienced users.

I mean to make it more user friendly and include more info in the same entry about jail.

jtremback commented 6 months ago

So i guess the debate here is that @tac0turtle feels that extraneous information should not be kept in state if you can just look it up in events. That's definitely a fair point and it's a common practice in blockchain development.

@freak12techno feels that this may be true, but in this case the extra information is so useful that it should be stored.

I should probably know this, but what would the command be to get a focused query to find out exactly why a validator was jailed, using events? Like, filtering for only jailing events with their address. Is that possible? That would probably satisfy @freak12techno's use case without adding any state bloat.

freak12techno commented 6 months ago

One more point: IMO the issue @tac0turtle outlined isn't really blocking the development of this, I think we can add it to the staking module and store it in the same db the same way we store for example delegations, then later, once the sidecar is introduced, move it there (as we'll have to move a lot of stuff to the sidecar anyways I guess?), what do you think of that? @jtremback @tac0turtle

Also, jails aren't a thing that happen quite often, like delegations, so IMO they shouldn't add that much of an overhead.

About finding out a reason why the validator was jailed: IMO ideally it should be a CLI command that allows you to see your latest jails and why they happened, or a query, or both (ideally both, as it's useful not only for validators, but for other people who build on Cosmos, like for explorers, so they can display validators' slashing history). Can this be done via events? As far as I know most of the <simd> query subcommands use /abci_query which takes the data from the node which takes the data from the db.