akash-network / support

Akash Support and Issue Tracking
Apache License 2.0
5 stars 4 forks source link

provider `0.3.2-rc2` panics because one of two SDL deployments services lacks `global: true` #112

Closed andy108369 closed 1 year ago

andy108369 commented 1 year ago

Provider-services 0.3.2-rc2 experiences a panic due to an SDL issue with two deployments , where one of them does not expose a global service (has no global: true).

provider

  "owner": "akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh",
  "host_uri": "https://provider.provider-02.sandbox-01.aksh.pw:8443",

version

provider-services v0.3.2-rc2 sandbox v0.23.2-rc3

SDL

I've tested this SDL, have also removed the 33060 port since it should not be there generally: https://github.com/akash-network/awesome-akash/blob/c3a8fedb5685078e7313be51732e178b6b3ca3f8/wordpress/deploy.yaml#L35-L37

Here is a working SDL where I've added global: true to the db service, and removed 33060 port which should not be there: https://gist.githubusercontent.com/andy108369/733cd5c4a191211885808469d7b4ec35/raw/420c5ecc2c2dc1bcb3cbcb5a3fee9d1d0eca37d1/wordpress-working-sdl.yaml

Logs

root@node1:~# kubectl -n akash-services logs akash-provider-0 |grep ^E

E[2023-08-06|15:32:49.397] adjust inventory for pending reservation     module=provider-cluster cmp=provider cmp=service cmp=inventory-service error="insufficient capacity"
E0806 15:34:22.761207       1 v2.go:104] io: read/write on closed pipe
E[2023-08-06|15:40:05.957] recovered from panic: runtime error: index out of range [1] with length 1 cmp=provider client=kube
E[2023-08-06|15:40:05.957] unable to deploy lid=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/285119/1/1/akash143ypn84kuf379tv9wvcxsmamhj83d5pg2rfc8v. last known state:
E[2023-08-06|15:40:05.957] deploying workload                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/285119/1/1/akash143ypn84kuf379tv9wvcxsmamhj83d5pg2rfc8v manifest-group=akash err="kube: internal error"
E[2023-08-06|15:40:05.957] execution error                              module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/285119/1/1/akash143ypn84kuf379tv9wvcxsmamhj83d5pg2rfc8v manifest-group=akash state=deploy-active err="kube: internal error"

All provider logs:

https://transfer.sh/QTY7vMopwi/provider.logs

arno01 commented 1 year ago

New logs with provider v0.3.2-rc3 (Atrur enabled extra debug info in it)

SDL tested => https://transfer.sh/mKzHRYPPNj/deploy.yaml.1

I[2023-08-07|13:16:22.267] manifest received                            module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280
I[2023-08-07|13:16:22.267] watchdog done                                module=provider-manifest cmp=provider lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280
I[2023-08-07|13:16:22.271] data received                                module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280 version=b69728b8071fdd2f08fbcef2153f4b204e2a82cee675ad84010d37a7d52a0998
D[2023-08-07|13:16:22.272] requests valid                               module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280 num-requests=1
D[2023-08-07|13:16:22.272] publishing manifest received                 module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280 num-leases=1
D[2023-08-07|13:16:22.272] publishing manifest received for lease       module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280 lease_id=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh
I[2023-08-07|13:16:22.272] manifest received                            module=provider-cluster cmp=provider cmp=service lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh
I[2023-08-07|13:16:22.280] hostnames withheld                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash cnt=0
D[2023-08-07|13:16:22.280] no services                                  cmp=provider client=kube lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh service=db
W0807 13:16:22.356779       1 warnings.go:70] unknown field "status"
E[2023-08-07|13:16:22.420] recovered from panic: 
goroutine 346 [running]:
runtime/debug.Stack()
    runtime/debug/stack.go:24 +0x65
github.com/akash-network/provider/cluster/kube.(*client).Deploy.func1()
    github.com/akash-network/provider/cluster/kube/client.go:247 +0x71
panic({0x2d1c440, 0xc001e1fa58})
    runtime/panic.go:884 +0x213
github.com/akash-network/provider/cluster/kube.(*client).Deploy(0xc00055c6c0, {0x3943250, 0xc000d7d2c0}, {0x39272e0, 0xc00131a640})
    github.com/akash-network/provider/cluster/kube/client.go:324 +0x1e90
github.com/akash-network/provider/cluster.(*deploymentManager).doDeploy(0xc0012ab680, {0x39431e0?, 0xc00005e048?})
    github.com/akash-network/provider/cluster/manager.go:378 +0xd5a
github.com/akash-network/provider/cluster.(*deploymentManager).startDeploy.func1()
    github.com/akash-network/provider/cluster/manager.go:273 +0x48
created by github.com/akash-network/provider/cluster.(*deploymentManager).startDeploy
    github.com/akash-network/provider/cluster/manager.go:272 +0xfc
 cmp=provider client=kube
E[2023-08-07|13:16:22.420] unable to deploy lid=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh. last known state:
 cmp=provider client=kube
E[2023-08-07|13:16:22.420] deploying workload                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash err="kube: internal error"
E[2023-08-07|13:16:22.420] execution error                              module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash state=deploy-active err="kube: internal error"
D[2023-08-07|13:16:22.469] purged hostnames                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.469] purged ips                                   module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] teardown complete                            module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] shutting down                                module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] waiting on dm.wg                             module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
I[2023-08-07|13:16:22.548] shutdown complete                            module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] hostnames released                           module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
D[2023-08-07|13:16:22.548] sending manager into channel                 module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh manifest-group=akash
I[2023-08-07|13:16:22.549] manager done                                 module=provider-cluster cmp=provider cmp=service lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1/akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh
D[2023-08-07|13:16:22.549] unreserving capacity                         module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1
I[2023-08-07|13:16:22.549] attempting to removing reservation           module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1
I[2023-08-07|13:16:22.549] removing reservation                         module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1
I[2023-08-07|13:16:22.549] unreserve capacity complete                  module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/299280/1/1
D[2023-08-07|13:16:22.549] reservation count                            module=provider-cluster cmp=provider cmp=service cmp=inventory-service cnt=0

Entire provider log https://transfer.sh/tpLhHlvWRn/logs.log

andy108369 commented 1 year ago

no panic with the provider 0.3.2-rc4 :partying_face: !


I[2023-08-09|16:54:14.873] order detected                               module=bidengine-service cmp=provider order=order/akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
I[2023-08-09|16:54:14.876] group fetched                                module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
I[2023-08-09|16:54:14.877] requesting reservation                       module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
D[2023-08-09|16:54:14.877] reservation requested                        module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1 resources[{resource:{id:1,cpu:{units:{val:1000}},memory:{size:{val:1073741824}},storage:[{name:default,size:{val:1073741824}},{name:wordpress-db,size:{val:1073741824},attributes:[{key:class,value:beta3},{key:persistent,value:true}]}],gpu:{units:{val:0}},endpoints:null},count:1,price:{denom:uakt,amount:10000.000000000000000000}},{resource:{id:2,cpu:{units:{val:1000}},memory:{size:{val:1073741824}},storage:[{name:default,size:{val:1073741824}},{name:wordpress-data,size:{val:1073741824},attributes:[{key:class,value:beta3},{key:persistent,value:true}]}],gpu:{units:{val:0}},endpoints:[{sequence_number:0}]},count:1,price:{denom:uakt,amount:10000.000000000000000000}}]=(MISSING)
D[2023-08-09|16:54:14.877] reservation count                            module=provider-cluster cmp=provider cmp=service cmp=inventory-service cnt=1
I[2023-08-09|16:54:14.877] Reservation fulfilled                        module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
D[2023-08-09|16:54:15.714] submitting fulfillment                       module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1 price=15.000000000000000000uakt

I[2023-08-09|16:54:21.127] bid complete                                 module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/332870/1/1
[332870-1-1]$ akash_status 
Detected provider for 332870/1/1: akash1rk090a6mq9gvm0h6ljf8kz8mrxglwwxsk4srxh
{
  "services": {
    "db": {
      "name": "db",
      "available": 1,
      "total": 1,
      "uris": null,
      "observed_generation": 1,
      "replicas": 1,
      "updated_replicas": 1,
      "ready_replicas": 0,
      "available_replicas": 1
    },
    "wordpress": {
      "name": "wordpress",
      "available": 1,
      "total": 1,
      "uris": [
        "rjo2kb21s5evffgll5o5grqq54.ingress.provider-02.sandbox-01.aksh.pw"
      ],
      "observed_generation": 1,
      "replicas": 1,
      "updated_replicas": 1,
      "ready_replicas": 0,
      "available_replicas": 1
    }
  },
  "forwarded_ports": {},
  "ips": null
}

as you can see db gets no forwarded_ports for its 3306 port exposed only to the wordpress service

I've also made sure the wordpress service can see the db (the one w/o the global:true):

root@wordpress-0:/var/www/html# echo >/dev/tcp/db/3306
root@wordpress-0:/var/www/html# echo $?
0