cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.27k stars 3.63k forks source link

CosmJS QueryClient validatorDelegations hangs on cosmos nodes #12756

Open oblak1 opened 2 years ago

oblak1 commented 2 years ago

Summary of Bug

I posted this issue on cosmjs repository and was asked to post this here as this could actually be an SDK issue.

We've tried calling validatorDelegations from staking module of the cosmjs library on multiple cosmos nodes via rpc port 26657 (public as well as our own) and every time it hangs - the node actually stops syncing - without returning anything. Sometimes we get the 504 Timeout which doesn't mean much.

We've been using cosmjs on multiple other tendermint nodes (such as axelar and gravity-bridge) where this issue doesn't arise.

Perhaps this issue happens due to the larger data scale on cosmos. We've tried to manually creating staking extension with a smaller pagination size but it still didn't work.

We've tried querying (via cli) the rest and gprc port which works okay.

Version

This was last tested on public rpc cosmos node tm version 0.34.14, gaia 7.

Steps to Reproduce

Connect to cosmos node and call QueryClient.staking.validatorDelegations (install cosmjs library) on one of the active validators.

webmaster128 commented 2 years ago

Looking at the Cosmos SDK codebase I understand two things (correct me if I'm wrong):

If that is the case, I guess the query implementation needs to be improved to support networks with large numbers of delegations.

alexanderbez commented 2 years ago

@webmaster128 sounds about right.

Note, for ValidatorDelegations, we perform a paginated query based on the delegations index. We should instead query over the validator's delegations instead, but we have no such tupled index.

oblak1 commented 2 years ago

I should also mention we experienced similar issues when indexing accounts via auth/accounts module, probably for the same reasons.

mariopino commented 2 years ago

Hi, same problem here! I'm indexing delegations in evmos mainnet (342157 delegations right now) and the performance is poor (takes around 11 hours to index all delegations from the 290 validators). Often after that the node is in caching up state or become irresponsive.

Some tests:

time curl "http://localhost:1317/cosmos/staking/v1beta1/validators/evmosvaloper1qq95x6dhrdnrfunlth5uh24tkrfphzl9crd3xr/delegations?pagination.limit=100&pagination.offset=0&pagination.count_total=true"
...
real    0m31.048s
user    0m0.008s
sys     0m0.000s

time grpcurl -plaintext -d '{"pagination": {"limit": 100}, "validator_addr": "evmosvaloper1qq95x6dhrdnrfunlth5uh24tkrfphzl9crd3xr" }' localhost:9090 cosmos.staking.v1beta1.Query/ValidatorDelegations
...
real    0m21.363s
user    0m0.005s
sys     0m0.025s
jgimeno commented 1 year ago

Any updates on this issue?

elias-orijtech commented 1 year ago

I'd like to analyze this further to fix it, but I would need exact instructions for reproducing the issue. Specifically how to build a node and connect to a chain that exhibits the behaviour. Anyone interested?

atheeshp commented 1 year ago

Recently, we added an index in the SDK to fasten the validatorDelegations query, may be this could help fixing this issue in coming versions. https://github.com/cosmos/cosmos-sdk/pull/15731

tac0turtle commented 1 year ago

I'd like to analyze this further to fix it, but I would need exact instructions for reproducing the issue. Specifically how to build a node and connect to a chain that exhibits the behaviour. Anyone interested?

running a cosmos hub or other main net would produce similar results. LMK which chain youd like to run and i can grab you instructions

elias-orijtech commented 1 year ago

Thanks. Any chain would do; evmos and gaia are mentioned in this issue. I prefer the one easiest to test :)

iramiller commented 2 months ago

For the ValidatorDelegations the default limit is 100, which should be reansoable to use and not customize

It is frustrating that any caller using a count-total or the some what incorrectly named --page-count-total from the cli (which is a result count, not a page count) will iterate a full collection just to get a count even with page limits, etc in place.