akash-network / support

Akash Support and Issue Tracking
5 stars 4 forks source link

provider should avoid submitting a transaction when it hits OrderMaxBids #31

Open andy108369 opened 2 years ago

andy108369 commented 2 years ago

akash v0.16.4.

Akash Provider hits too many existing bids (20): unknown provider errors when is using a state-sync RPC node, but is working fine when we've switched it back to the primary (archive RPC node):

I[2022-07-13|12:31:45.339] syncing sequence                             cmp=client/broadcaster local=511421 remote=511421
I[2022-07-13|12:31:48.706] order detected                               module=bidengine-service order=order/akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/6726306/1/1
I[2022-07-13|12:31:48.708] group fetched                                module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/6726306/1/1
I[2022-07-13|12:31:48.708] requesting reservation                       module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/6726306/1/1
D[2022-07-13|12:31:48.708] reservation requested                        module=provider-cluster cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/6726306/1/1 resources="group_id:<owner:\"akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h\" dseq:6726306 gseq:1 > state:open group_spec:<name:\"akash\" requirements:<signed_by:<> > resources:<resources:<cpu:<units:<val:\"1000\" > > memory:<quantity:<val:\"1073741824\" > > storage:<name:\"default\" quantity:<val:\"1073741824\" > > endpoints:<> endpoints:<kind:RANDOM_PORT > > count:1 price:<denom:\"uakt\" amount:\"10000000000000000000000000\" > > > created_at:6726307 "
D[2022-07-13|12:31:48.709] reservation count                            module=provider-cluster cmp=service cmp=inventory-service cnt=-61
I[2022-07-13|12:31:48.709] Reservation fulfilled                        module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/6726306/1/1
D[2022-07-13|12:31:50.924] submitting fulfillment                       module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/6726306/1/1 price=55.000000000000000000uakt
I[2022-07-13|12:31:51.031] broadcast response                           cmp=client/broadcaster response="code: 0\ncodespace: \"\"\ndata: \"\"\nevents: []\ngas_used: \"0\"\ngas_wanted: \"0\"\nheight: \"0\"\ninfo: \"\"\nlogs: []\nraw_log: '[]'\ntimestamp: \"\"\ntx: null\ntxhash: CE5C3C45D85BC154918E3EBE8B6DFEF235B701485BFC2F3DCE6E4F04B0D35AB2\n" err=null
I[2022-07-13|12:31:51.031] bid complete                                 module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/6726306/1/1
2022/07/13 12:31:53 http: TLS handshake error from 10.233.6.240:57546: EOF
I[2022-07-13|12:31:55.419] syncing sequence                             cmp=client/broadcaster local=511422 remote=511422

This transaction is failed to execute message; message index: 0: too many existing bids (20): unknown provider https://www.mintscan.io/akash/txs/CE5C3C45D85BC154918E3EBE8B6DFEF235B701485BFC2F3DCE6E4F04B0D35AB2

Haven't tried the non state-sync & non archive RPC node (i.e. the snapshot RPC node).

Seeing unknown lease for bid: invalid request in the logs, I think it could be somehow related to that the AMS1 host is running older deployments which state-sync RPC node doesn't have indexed, leading to this sort of issue.

andy108369 commented 2 years ago

Based on the above, it makes me think that too many existing bids (20): unknown provider message is misleading. As there is no such issue when running the akash-provider against the archive RPC node.

tidrolpolelsef commented 2 years ago

I haven't seen this issue before, is this blocking anything in prod?

troian commented 2 years ago

it is a network parameter that limits to 20 bids for one order https://github.com/ovrclk/akash/blob/7fba1c7b05a37d7d043411a423e383e6c861d10f/x/market/handler/server.go#L42

tidrolpolelsef commented 2 years ago

Thanks for the context. It sounds like the provider should avoid trying to submit a transaction for a new bid in that scenario.

andy108369 commented 2 years ago

I've tried limiting deployment to the specific provider using placement group attributes, as well as running the state-sync RPC node, got the bid without issues.

  placement:
    akash:
      attributes:
        host: akash
        datacenter: equinix-metal-ams1

Have noticed lots of too many existing bids (20) on the other providers today.

I think it's time that we lift the OrderMaxBids from 20 to 40 or 100.

$ akash query params subspace market OrderMaxBids -o json | jq
{
  "subspace": "market",
  "key": "OrderMaxBids",
  "value": "20"
}

Related code paths:

Moved OrderMaxBids part to => https://github.com/ovrclk/akash/issues/1662

I'll rename this issue to "provider should avoid submitting a transaction when it hits OrderMaxBids".

andy108369 commented 2 years ago

Additionally, the unknown provider message in the error too many existing bids (20): unknown provider needs to be changed.

Perhaps ErrUnknownProvider indicates an invalid chain parameter & ErrInvalidBid indicates an invalid chain parameter comments as well.

As per @tidrolpolelsef, looks to be a copy-paste error: https://github.com/ovrclk/akash/blob/7fba1c7b05a37d7d043411a423e383e6c861d10f/x/market/types/v1beta2/errors.go#L106