Closed andy108369 closed 11 months ago
and provider says it has insufficient capacity sometimes (usually gets deployed fine on a 2nd attempt), I've started seeing this from today:
$ date; provider_info.sh provider.hurricane.akash.pub Thu Sep 28 06:36:37 PM CEST 2023 type cpu gpu ram ephemeral persistent used 49.5 1 143.5 388.5 500 pending 0 0 0 0 0 available 43.395 0 30.681856155395508 1420.2646561246365 713.9642942994833 node 43.395 0 30.681856155395508 1420.2646561246365 N/A```
D[2023-09-28|16:32:54.261] cluster resources dump={"nodes":[{"name":"worker-01.hurricane2","allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054},"available":{"cpu":43395,"gpu":0,"memory":32944392192,"storage_ephemeral":1524997562430}}],"total_allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054,"storage":{"beta3":828738306048}},"total_available":{"cpu":43395,"gpu":0,"memory":32944392192,"storage_ephemeral":1524997562430,"storage":{"beta3":767944789872}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service I[2023-09-28|16:32:58.882] order detected module=bidengine-service cmp=provider order=order/akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/13006204/1/1 I[2023-09-28|16:32:58.884] group fetched module=bidengine-order cmp=provider order=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/13006204/1/1 I[2023-09-28|16:32:58.884] requesting reservation module=bidengine-order cmp=provider order=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/13006204/1/1 D[2023-09-28|16:32:58.884] reservation requested module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/13006204/1/1 resources[{resource:{id:1,cpu:{units:{val:2000}},memory:{size:{val:2147483648}},storage:[{name:default,size:{val:1073741824}}],gpu:{units:{val:0}},endpoints:[{sequence_number:0}]},count:1,price:{denom:uakt,amount:1000.000000000000000000}},{resource:{id:2,cpu:{units:{val:2000}},memory:{size:{val:8589934592}},storage:[{name:default,size:{val:1073741824}}],gpu:{units:{val:0}},endpoints:[{sequence_number:0}]},count:1,price:{denom:uakt,amount:1000.000000000000000000}}]=(MISSING) I[2023-09-28|16:32:58.884] insufficient capacity for reservation module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/13006204/1/1 E[2023-09-28|16:32:58.884] reserving resources module=bidengine-order cmp=provider order=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/13006204/1/1 err="insufficient capacity" I[2023-09-28|16:32:58.884] shutting down module=bidengine-order cmp=provider order=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/13006204/1/1 D[2023-09-28|16:33:01.525] cluster resources dump={"nodes":[{"name":"worker-01.hurricane2","allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054},"available":{"cpu":43395,"gpu":0,"memory":32944392192,"storage_ephemeral":1524997562430}}],"total_allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054,"storage":{"beta3":828694003712}},"total_available":{"cpu":43395,"gpu":0,"memory":32944392192,"storage_ephemeral":1524997562430,"storage":{"beta3":767900487536}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service
[https://rpc.akashnet.net:443][default][13001588--1]$ cat deploy.yaml # Simple deployment. --- version: "2.0" services: app: image: bsord/tetris # command: # - "sh" # - "-c" # args: # - sleep infinity expose: - port: 80 as: 80 to: - global: true #accept: # - "tetris.yourdomain.com" profiles: compute: app: resources: cpu: units: 1 memory: size: 4Gi storage: size: 20Gi placement: akash: pricing: app: denom: uakt amount: 1000000 deployment: app: akash: profile: app count: 1 $ date; provider_info.sh provider.hurricane.akash.pub Thu Sep 28 10:56:22 AM CEST 2023 type cpu gpu ram ephemeral persistent used 55.5 1 156.5 392.5 500 pending 0 0 0 0 0 available 37.395 0 17.681856155395508 1416.2646561246365 728.6417311616242 node 37.395 0 17.681856155395508 1416.2646561246365 N/A
13001588: insufficient capacity for reservation
$ grep -C30 -E 13001588 hurricane-provider.log | grep -Ev 'check|ip' D[2023-09-28|08:55:00.933] service available replicas below target module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash16p6lrlxf7f03c0ka8cv4sznr29rym27uv0qz0d/12991178/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=akash cmp=deployment-monitor service=app available=0 target=1 D[2023-09-28|08:55:01.102] cluster resources dump={"nodes":[{"name":"worker-01.hurricane2","allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054},"available":{"cpu":37395,"gpu":0,"memory":18985748480,"storage_ephemeral":1520702595134}}],"total_allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054,"storage":{"beta3":836884627456}},"total_available":{"cpu":37395,"gpu":0,"memory":18985748480,"storage_ephemeral":1520702595134,"storage":{"beta3":780783722388}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service D[2023-09-28|08:55:05.039] service available replicas below target module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash16p6lrlxf7f03c0ka8cv4sznr29rym27uv0qz0d/12991178/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=akash cmp=deployment-monitor service=app available=0 target=1 I[2023-09-28|08:55:05.122] order detected module=bidengine-service cmp=provider order=order/akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001588/1/1 I[2023-09-28|08:55:05.126] group fetched module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001588/1/1 I[2023-09-28|08:55:05.126] requesting reservation module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001588/1/1 D[2023-09-28|08:55:05.126] reservation requested module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001588/1/1 resources[{resource:{id:1,cpu:{units:{val:1000}},memory:{size:{val:4294967296}},storage:[{name:default,size:{val:21474836480}}],gpu:{units:{val:0}},endpoints:[{sequence_number:0}]},count:1,price:{denom:uakt,amount:1000000.000000000000000000}}]=(MISSING) I[2023-09-28|08:55:05.126] insufficient capacity for reservation module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001588/1/1 E[2023-09-28|08:55:05.126] reserving resources module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001588/1/1 err="insufficient capacity" I[2023-09-28|08:55:05.126] shutting down module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001588/1/1 D[2023-09-28|08:55:08.008] cluster resources dump={"nodes":[{"name":"worker-01.hurricane2","allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054},"available":{"cpu":37395,"gpu":0,"memory":18985748480,"storage_ephemeral":1520702595134}}],"total_allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054,"storage":{"beta3":836876632064}},"total_available":{"cpu":37395,"gpu":0,"memory":18985748480,"storage_ephemeral":1520702595134,"storage":{"beta3":780775726996}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service
13001692: Reservation fulfilled
$ grep -C30 -E 13001692 hurricane-provider.log | grep -Ev 'check|ip' D[2023-09-28|09:05:32.917] cluster resources dump={"nodes":[{"name":"worker-01.hurricane2","allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054},"available":{"cpu":37395,"gpu":0,"memory":18985748480,"storage_ephemeral":1520702595134}}],"total_allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054,"storage":{"beta3":838743359488}},"total_available":{"cpu":37395,"gpu":0,"memory":18985748480,"storage_ephemeral":1520702595134,"storage":{"beta3":782640832404}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service I[2023-09-28|09:05:34.077] order detected module=bidengine-service cmp=provider order=order/akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001692/1/1 I[2023-09-28|09:05:34.080] group fetched module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001692/1/1 I[2023-09-28|09:05:34.080] requesting reservation module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001692/1/1 D[2023-09-28|09:05:34.080] reservation requested module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001692/1/1 resources[{resource:{id:1,cpu:{units:{val:1000}},memory:{size:{val:4294967296}},storage:[{name:default,size:{val:21474836480}}],gpu:{units:{val:0}},endpoints:[{sequence_number:0}]},count:1,price:{denom:uakt,amount:1000000.000000000000000000}}]=(MISSING) D[2023-09-28|09:05:34.080] reservation count module=provider-cluster cmp=provider cmp=service cmp=inventory-service cnt=10 I[2023-09-28|09:05:34.080] Reservation fulfilled module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001692/1/1 D[2023-09-28|09:05:34.764] submitting fulfillment module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001692/1/1 price=15.313822000000000000uakt I[2023-09-28|09:05:36.895] filtering pods cmp=provider client=kube labelSelector= D[2023-09-28|09:05:37.852] cluster resources dump={"nodes":[{"name":"worker-01.hurricane2","allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054},"available":{"cpu":37395,"gpu":0,"memory":18985748480,"storage_ephemeral":1520702595134}}],"total_allocatable":{"cpu":102000,"gpu":1,"memory":210936590336,"storage_ephemeral":1942146261054,"storage":{"beta3":838713409536}},"total_available":{"cpu":37395,"gpu":0,"memory":18985748480,"storage_ephemeral":1520702595134,"storage":{"beta3":782610882452}}} module=provider-cluster cmp=provider cmp=service cmp=inventory-service I[2023-09-28|09:05:40.160] bid complete module=bidengine-order cmp=provider order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13001692/1/1
cluster resources dump comparison between the dseq's
Can't reproduce this after disabling the unattended upgrades.
Likely the unattended upgrades were the root cause of the issue https://github.com/akash-network/support/issues/131
and provider says it has insufficient capacity sometimes (usually gets deployed fine on a 2nd attempt), I've started seeing this from today:
seen the same issue earlier today on the same provider
13001588: insufficient capacity for reservation
13001692: Reservation fulfilled
cluster resources dump comparison between the dseq's