Open andy108369 opened 10 months ago
still happens with provider 0.5.4, on akash network 0.32.2; have only observed this to happen on the Hurricane provider.
It feels like this issue triggers when provider scans through the leases running check
/ check result
(which is quite constantly happening at high pace on the Hurricane when I look at the provider logs) , and if there is not enough delay between tx update deloyment
and send-manifest
.
still happens with provider 0.6.2, on akash network 0.36.0
example with 17438710
dseq, kube-builder
just errored with ClusterParams() returned result of unexpected type (%!s(<nil>))
upon updating the SDL.
provider logs 152-hurricane.log
$ cat /tmp/152-hurricane.log | grep -Ev 'operator=ip|running check|check result|below target' | grep 17438710
I[2024-08-13|16:23:13.197] update received module=provider-manifest cmp=provider deployment=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710 version=7C21B33A56D24DDBDFF34960DF02751567DE89C89EEDF01D9B95A26642879BE1
I[2024-08-13|16:23:22.264] manifest received module=manifest-manager cmp=provider deployment=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710
I[2024-08-13|16:23:22.266] data received module=manifest-manager cmp=provider deployment=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710 version=7c21b33a56d24ddbdff34960df02751567de89c89eedf01d9b95a26642879be1
D[2024-08-13|16:23:22.267] requests valid module=manifest-manager cmp=provider deployment=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710 num-requests=1
D[2024-08-13|16:23:22.267] publishing manifest received module=manifest-manager cmp=provider deployment=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710 num-leases=1
D[2024-08-13|16:23:22.267] publishing manifest received for lease module=manifest-manager cmp=provider deployment=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710 lease_id=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk
I[2024-08-13|16:23:22.267] manifest received module=provider-cluster cmp=provider cmp=service lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk
D[2024-08-13|16:23:22.267] shutting down module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cmp=deployment-monitor
D[2024-08-13|16:23:22.267] shutdown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cmp=deployment-monitor
I[2024-08-13|16:23:22.272] hostnames withheld module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cnt=0
E[2024-08-13|16:23:22.272] deploying workload module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
E[2024-08-13|16:23:22.272] execution error module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud state=deploy-active err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
D[2024-08-13|16:23:22.276] purged ips module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2024-08-13|16:23:22.297] purged hostnames module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2024-08-13|16:23:22.297] teardown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2024-08-13|16:23:22.297] shutting down module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2024-08-13|16:23:22.297] waiting on dm.wg module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
I[2024-08-13|16:23:22.297] shutdown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2024-08-13|16:23:22.297] hostnames released module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2024-08-13|16:23:22.297] sending manager into channel module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
I[2024-08-13|16:23:22.297] manager done module=provider-cluster cmp=provider cmp=service lease=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk
D[2024-08-13|16:23:22.297] unreserving capacity module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1
I[2024-08-13|16:23:22.297] attempting to removing reservation module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1
I[2024-08-13|16:23:22.297] removing reservation module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1
I[2024-08-13|16:23:22.297] unreserve capacity complete module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1qh0f0h7jlq4x5gpxghrxvps5l09y7uuvcumcyd/17438710/1/1
the issue is still present in provider 0.6.4
additional logs stored under node2.hurricane.akash.pub:/root/issue-152-logs
dir.
Spotted the same issue on Valdi provider for dseqs 17676873 and 17687779.
Complete provider logs saved under root@node2.h100.wdc.val.akash.pub:/root/provider-logs-issue-152
dir.
D[2024-08-21|21:22:31.786] running check module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud cmp=deployment-monitor attempt=1
I[2024-08-21|21:22:31.807] check result module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud cmp=deployment-monitor ok=true attempt=1
I[2024-08-21|21:22:41.874] update received module=provider-manifest cmp=provider deployment=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873 version=1824113459BC475B447403E58AE0CBF45DB47A89C5E6E295A7F2C27FE3679D56
D[2024-08-21|21:22:43.433] running check module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud cmp=deployment-monitor attempt=1
I[2024-08-21|21:22:43.453] check result module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud cmp=deployment-monitor ok=true attempt=1
I[2024-08-21|21:22:50.428] manifest received module=manifest-manager cmp=provider deployment=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873
I[2024-08-21|21:22:50.433] data received module=manifest-manager cmp=provider deployment=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873 version=1824113459bc475b447403e58ae0cbf45db47a89c5e6e295a7f2c27fe3679d56
D[2024-08-21|21:22:50.434] requests valid module=manifest-manager cmp=provider deployment=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873 num-requests=1
D[2024-08-21|21:22:50.434] publishing manifest received module=manifest-manager cmp=provider deployment=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873 num-leases=1
D[2024-08-21|21:22:50.434] publishing manifest received for lease module=manifest-manager cmp=provider deployment=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873 lease_id=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8
I[2024-08-21|21:22:50.434] manifest received module=provider-cluster cmp=provider cmp=service lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8
D[2024-08-21|21:22:50.435] shutting down module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud cmp=deployment-monitor
D[2024-08-21|21:22:50.435] shutdown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud cmp=deployment-monitor
I[2024-08-21|21:22:50.441] hostnames withheld module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud cnt=0
E[2024-08-21|21:22:50.441] deploying workload module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
E[2024-08-21|21:22:50.441] execution error module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud state=deploy-active err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
D[2024-08-21|21:22:50.445] purged ips module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud
D[2024-08-21|21:22:50.452] purged hostnames module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud
D[2024-08-21|21:22:50.453] teardown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud
D[2024-08-21|21:22:50.453] shutting down module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud
D[2024-08-21|21:22:50.453] waiting on dm.wg module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud
I[2024-08-21|21:22:50.453] shutdown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud
D[2024-08-21|21:22:50.453] hostnames released module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud
D[2024-08-21|21:22:50.453] sending manager into channel module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud
I[2024-08-21|21:22:50.453] manager done module=provider-cluster cmp=provider cmp=service lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8
D[2024-08-21|21:22:50.453] unreserving capacity module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1
I[2024-08-21|21:22:50.453] attempting to removing reservation module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1
I[2024-08-21|21:22:50.453] removing reservation module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1
I[2024-08-21|21:22:50.453] unreserve capacity complete module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17676873/1/1
E[2024-08-21|21:17:53.841] execution error module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash19jqc8tsdtzvm2zd4mcg0vx9fll4feegfduvpp8/17687779/1/1/akash19ah5c95kq4kz2g6q5rdkdgt80kc3xycsd8plq8 manifest-group=dcloud state=deploy-active err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
Provider v0.6.5-rc6 has some patches which try to fix this issue.
The first week has been pretty smooth with 0.6.5-rc6
on Hurricane provider! :rocket:
@troian let's release v0.6.5-rc6
? It's been running well in the past three weeks on the Hurricane provider.
$ kubectl -n akash-services get pods -o custom-columns='NAME:.metadata.name,IMAGE:.spec.containers[*].image'
NAME IMAGE
akash-node-1-0 ghcr.io/akash-network/node:0.36.0
akash-provider-0 ghcr.io/akash-network/provider:0.6.5-rc6
operator-hostname-79fc5855bb-hk9bc ghcr.io/akash-network/provider:0.6.5-rc6
operator-inventory-7cdfdb65d7-msl6c ghcr.io/akash-network/provider:0.6.5-rc6
operator-inventory-hardware-discovery-control-01.hurricane2 ghcr.io/akash-network/provider:0.6.5-rc6
operator-inventory-hardware-discovery-worker-01.hurricane2 ghcr.io/akash-network/provider:0.6.5-rc6
operator-ip-796b49c77-k4xgh ghcr.io/akash-network/provider:0.6.5-rc6
$ kubectl -n akash-services get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
akash-node-1-0 1/1 Running 1 (44d ago) 44d 10.233.73.131 worker-01.hurricane2 <none> <none>
akash-provider-0 1/1 Running 2 (9d ago) 24d 10.233.73.155 worker-01.hurricane2 <none> <none>
operator-hostname-79fc5855bb-hk9bc 1/1 Running 0 24d 10.233.73.161 worker-01.hurricane2 <none> <none>
operator-inventory-7cdfdb65d7-msl6c 1/1 Running 0 24d 10.233.73.144 worker-01.hurricane2 <none> <none>
operator-inventory-hardware-discovery-control-01.hurricane2 1/1 Running 0 24d 10.233.117.178 control-01.hurricane2 <none> <none>
operator-inventory-hardware-discovery-worker-01.hurricane2 1/1 Running 0 24d 10.233.73.179 worker-01.hurricane2 <none> <none>
operator-ip-796b49c77-k4xgh 1/1 Running 0 24d 10.233.73.181 worker-01.hurricane2 <none> <none>
$ kubectl -n akash-services logs akash-provider-0 |grep ClusterParams
Defaulted container "provider" out of: provider, init (init)
$ kubectl -n akash-services logs akash-provider-0 --previous |grep ClusterParams
Defaulted container "provider" out of: provider, init (init)
I am still seeing this error (
err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
) on Hurricane provider with k8sv1.27.5
(delivered with kubesprayv2.23.0
) when sending-manifest to the provider (using the CLI) and that's not limited to theimage
update in SDL, but alsoenv
update.It is not always happening, but rather sporadically.
Todo
Provider logs: