provider: detect closed leases via API return values

boz commented 3 years ago

Context here: there are apparently cases where the provider thinks that a lease is open when it is not.

This can happen for a number of reasons (tendermint pubsub dropping events, bid closes in a way that we don't detect, etc...) but we should not get stuck re-trying withdraws on a lease that has already been closed.

The withdraw response is handled here. Ideally, any time we detect that a lease is closed, all processes working on that lease would be shut down, not just the withdraw. If that is too involved, then we can start with just the withdraw process.

arno01 commented 3 years ago

Probably related: each time I'm restarting akash provider, it broadcasts Close Bid (which failing due to failed to execute message; message index: 0: unknown lease for bid) for randomly selected DSEQ's, i.e. 1346631, 1598194 for the deployments created by akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda.

Failed Close Bid transactions each time I've restarted akash provider 3 times just now:

https://www.mintscan.io/akash/txs/7A35D546C4B20E324D4295817D22E3006A43AAA15E7E14902EAFA3AEB861DAD1 https://www.mintscan.io/akash/txs/9E1882AFB2E59C506D63A946E46D441B89F5BF5D80A0E627BAC8DF994C28921E https://www.mintscan.io/akash/txs/04C4536A179117637D4E093930BE00D9C245ECDC73F5C1B60D09E42A1718F6B7

and here is when akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda attempted creating his deployment on my provider, interestingly, as it appears to me (if I did my query correctly), it was just a single attempt for the DSEQ 1597593:

DSEQ's for which my akash provider attempts to Close Bid aren't even there.

$ akash query txs --events "message.sender=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda" --page 1 --limit 3000 > akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda.txs

$ cat akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda.txs | jq -r '.txs[] | [ .timestamp, .txhash, (.tx.body.messages[] | ."@type"), (if .tx.body.messages[].bid_id == null then "" else .tx.body.messages[].bid_id[] end) ] | @csv' 
...
"2021-06-30T20:03:30Z","06A05BB4A34CC63FD698791CB5EE768397D661317711BA054C51FED68CE3E6A5","/akash.deployment.v1beta1.MsgCreateDeployment",""
"2021-06-30T20:04:26Z","9D961BCBEBB0BE9DFB58715DC60C59E02696CE2E47EB64780DD4D80F568DEA52","/akash.market.v1beta1.MsgCreateLease","akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda","1597593",1,1,"akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0"
"2021-06-30T20:07:16Z","EB7CFD0B7ADE6B893AD84E20B160D70FA4E4FEC36C77ADD64BC7143D53A51D03","/akash.deployment.v1beta1.MsgCloseDeployment",""
...

My akash provider's address akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0

Let me know if you need any additional information.

arno01 commented 3 years ago

Looks like my provider is having these issues coming from different clients (owners) now, not as frequent as before, but here it is:

My provider https://www.mintscan.io/akash/account/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0

Create Bid success https://www.mintscan.io/akash/txs/37E4042727170E6A91C04479217C6779F40DF47FAB48A15DBD8BBE34F584A566
Close Bid fail https://www.mintscan.io/akash/txs/694FEC03099C1415AED15663069E64B927175FC92856696E8340F1254EC774EE
Create Bid success https://www.mintscan.io/akash/txs/FA58E59F3D7C5A5DD09D9C127F92A885AA3831534F84BE4EDEC44CB1317CEC83
Close Bid fail https://www.mintscan.io/akash/txs/4A6734576C2D4A077D6DFE34FD9F49D31D8EEA752F2E28D6FEFC5C95CAFBF2ED

hydrogen18 commented 2 years ago

It's very likely we're losing events from tendermint / cosmos about lease close events on the blockchain.

@boz We already have the deploymentManager that lives for the duration of the lease. While it is possible to use the withdrawal attempt information to determine if the lease is still open, is there any problem with just adding something into the deployment manager that directly polls the lease status? It's basically free (no fees since it isn't a transaction). I can make the frequency of the polling configurable & set it to a fairly high value by default (~1 hr).

andy108369 commented 1 year ago

Looks to be fixed. I haven't seen this issue in years.

akash-network / support

provider: detect closed leases via API return values #52