Cosmos Hub Restake always fails

freak12techno commented 1 year ago

Hello there.

I joined Cosmos Hub 2 days ago as an active validator, and ReStake always fails for me when querying delegators:

[22:31:02.224] Finding delegators...
[22:31:02.347] Sent 0/0 transactions
[22:31:02.347] Failed with error: Failed to get delegations: Request failed with status code 404
[22:31:02.433] Autostake failed after 3 attempt(s)
[22:31:02.433] Attempt 1:
[22:31:02.434] Sent 0/0 transactions
[22:31:02.434] Failed with error: Failed to get delegations: Request failed with status code 404
[22:31:02.434] Attempt 2:
[22:31:02.434] Sent 0/0 transactions
[22:31:02.434] Failed with error: Failed to get delegations: Request failed with status code 501
[22:31:02.434] Attempt 3:
[22:31:02.434] Sent 0/0 transactions
[22:31:02.434] Failed with error: Failed to get delegations: Request failed with status code 404
[22:31:02.434] Autostake failed

My assumption is that this is happening because to get all delegators, it uses the API endpoint like https://api.cosmos.quokkastake.io/cosmos/staking/v1beta1/validators/cosmosvaloper1sjllsnramtg3ewxqwwrwjxfgc4n4ef9u2lcnj0/delegations, which can take really long time to answer, causing most of the nodes to timeout. I've submitted an issue on Cosmos Hub repository: https://github.com/cosmos/cosmos-sdk/issues/15162, basically they are saying such tools should use indexers to query data. Do you think it's possible to have some workarounds for that? Pretty sure I am not the only one who is facing that.

tombeynon commented 1 year ago

Hey @freak12techno, you're correct this is because restake relies on a few endpoints like validator delegations which nodes can struggle with at scale. Many node operators close this endpoint for this reason. It's way more of a problem on large chains like cosmoshub, Juno etc.

Ultimately, you should use your own node for it to be stable. It shouldn't cause any problem at all for your own node, but when lots of restake operators are using the same nodes it causes them to stall. A node could even be shared between a few operators without issue.

There is a restUrl config that you can use instead of the default public nodes. Unfortunately there's not much I can do about this otherwise, but will definitely have a think about using indexing solutions.

freak12techno commented 1 year ago

Just to add, I manage to fix it for myself by 1) using my own node that's not on cosmos.directory listing, 2) tuned nginx to have a max timeout available and 3) tuned networks.local.json to have a max timeout available (as such requests can take up to 5 minutes). This make some of my other queries on this node fail when restake is in action, but at least it does its job.

xloem commented 1 year ago

I think a robust solution here would be to add restaking to the original node go code, offer it as a contribution to the cosmos team, and as an applyable patch to validators. Second to that to work off an existing indexing system such as block explorer source. EDIT: Actually I think APIs provide a way to query transactions matching a filter. This would be more robust than iterating all results and simpler than full indexing. They could be cached and only retrieved for new block heights. If the grantee and delegatee is hard to filter, the UI could add something easier to filter to find them.

eco-stake / restake

Cosmos Hub Restake always fails #726