akash-network / support

Akash Support and Issue Tracking
Apache License 2.0
5 stars 4 forks source link

provider `0.3.2-rc2` stopped bidding after `unknown lease for bid` #111

Closed andy108369 closed 1 year ago

andy108369 commented 1 year ago

provider-services 0.3.2-rc2 stopped bidding after unknown lease for bid.

provider

  "owner": "akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh",
  "host_uri": "https://provider.provider-02.sandbox-01.aksh.pw:8443",

version

provider-services v0.3.2-rc2 sandbox v0.23.2-rc3

logs

D[2023-08-04|21:07:41.980] reservation count                            module=provider-cluster cmp=provider cmp=service cmp=inventory-servic
e cnt=0
D[2023-08-04|21:07:41.980] closing bid                                  module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0u
hsct3sfwsfu9w3c9h/257261/1/1 order-id=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/257261/1/1
E[2023-08-04|21:07:41.990] transaction broadcast failed                 cmp=provider cmp=client/broadcaster err="rpc error: code = Unknown de
sc = rpc error: code = Unknown desc = failed to execute message; message index: 0: unknown lease for bid [cosmos/cosmos-sdk@v0.45.16/baseapp/
baseapp.go:781] With gas wanted: '0' and gas used: '62021' : unknown request"
E[2023-08-04|21:07:41.990] unable to broadcast messages                 cmp=provider cmp=client/broadcaster error="rpc error: code = Unknown 
desc = rpc error: code = Unknown desc = failed to execute message; message index: 0: unknown lease for bid [cosmos/cosmos-sdk@v0.45.16/baseap
p/baseapp.go:781] With gas wanted: '0' and gas used: '62021' : unknown request"
E[2023-08-04|21:07:41.990] closing bid                                  module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/257261/1/1 err="rpc error: code = Unknown desc = rpc error: code = Unknown desc = failed to execute message; message index: 0: unknown lease for bid [cosmos/cosmos-sdk@v0.45.16/baseapp/baseapp.go:781] With gas wanted: '0' and gas used: '62021' : unknown request"
D[2023-08-04|21:08:01.836] cluster resources dump={"nodes":[{"name":"node1","allocatable":{"cpu":16000,"gpu":1,"memory":33539747840,"storage_ephemeral":187136925387},"available":{"cpu":14650,"gpu":1,"memory":32891654144,"storage_ephemeral":187136925387}}],"total_allocatable":{"cpu":16000,"gpu":1,"memory":33539747840,"storage_ephemeral":187136925387,"storage":{"beta3":32625043456}},"total_available":{"cpu":14650,"gpu":1,"memory":32891654144,"storage_ephemeral":187136925387,"storage":{"beta3":32625034911}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service
D[2023-08-04|21:08:52.295] cluster resources dump={"nodes":[{"name":"node1","allocatable":{"cpu":16000,"gpu":1,"memory":33539747840,"storage_ephemeral":187136925387},"available":{"cpu":14650,"gpu":1,"memory":32891654144,"storage_ephemeral":187136925387}}],"total_allocatable":{"cpu":16000,"gpu":1,"memory":33539747840,"storage_ephemeral":187136925387,"storage":{"beta3":32625043456}},"total_available":{"cpu":14650,"gpu":1,"memory":32891654144,"storage_ephemeral":187136925387,"storage":{"beta3":32625034911}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service

more logs

provider-stopped-bidding.log

checking the provider directly

Can see the order request from within the provider pod, so this doesn't seem to be a networking issue.

root@node1:~# kubectl -n akash-services exec -ti akash-provider-0 -- bash
root@akash-provider-0:/# provider-services version
v0.3.2-rc2
root@akash-provider-0:/# provider-services events
{
  "context": {
    "module": "deployment",
    "action": "deployment-created"
  },
  "id": {
    "owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
    "dseq": 284628
  },
  "version": "H5OQ71y9GyoQef8eYX8LMXJtn91ngRkldA4XG5GpKpo="
}
{
  "context": {
    "module": "market",
    "action": "order-created"
  },
  "id": {
    "owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
    "dseq": 284628,
    "gseq": 1,
    "oseq": 1
  }
}
{
  "context": {
    "module": "market",
    "action": "bid-created"
  },
  "id": {
    "owner": "akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h",
    "dseq": 284628,
    "gseq": 1,
    "oseq": 1,
    "provider": "akash143ypn84kuf379tv9wvcxsmamhj83d5pg2rfc8v"         <<<< Shimpa's provider `https://provider.shimpa.org:8443`
  },
  "price": {
    "denom": "uakt",
    "amount": "11.604480000000000000"
  }
}
$ curl -sk https://provider.provider-02.sandbox-01.aksh.pw:8443/status | jq -r .
{
  "cluster": {
    "leases": 0,
    "inventory": {
      "available": {
        "nodes": [
          {
            "cpu": 14650,
            "gpu": 1,
            "memory": 32891654144,
            "storage_ephemeral": 187136925387
          }
        ],
        "storage": [
          {
            "class": "beta3",
            "size": 32624494239
          }
        ]
      }
    }
  },
  "bidengine": {
    "orders": 0
  },
  "manifest": {
    "deployments": 0
  },
  "cluster_public_hostname": "provider.provider-02.sandbox-01.aksh.pw",
  "address": "akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh"
}

additional info

It doesn't appear to be the account sequence mismatch issue since there is no such message in the provider logs.

Potentially related to https://github.com/akash-network/support/issues/92