Closed andy108369 closed 7 months ago
Another case with the kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))
error.
This time I have the complete provider logs. (provider v0.4.6)
image
name, missing @
before sha256:<sha256>
; though this was not seem to be the cause of the issue (although, it could)The log file also contained:
E[2023-09-12|00:45:40.165] execution error module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/9966260/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud state=deploy-active err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
More of the log file https://gist.githubusercontent.com/andy108369/b159c06b4f0d57e13f1915a0b3d94a5f/raw/dc56a3dd240970bd64dcb6d14f4ef23e1493440b/err.log
The deployment was terminated, manifest & namespace were removed by the akash-provider. The bid/lease/order/deployment were in active/open states as before.
noticed the bid/lease weren't closed for the prod console;
so I've recovered the console manifest the same way as before https://gist.github.com/arno01/8a97a3f7bdbc3ec8d82d4aa20fd9fab2?permalink_comment_id=4685270#gistcomment-4685270
deployment received an error (the same one we've seen before, due to which the staging console got closed) - manifest-group=dcloud state=deploy-active err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
-- https://github.com/akash-network/support/issues/121 ; I was able to collect the entire akash-provider log. The error was triggered by running this job (2nd run) https://github.com/ovrclk/console-infrastructure/actions/runs/6156222573/job/16704584131
Logs - The logs of a failure after send-manifest
hurricane-provider-console-kube-builder-ClusterParams-error.log
Hint: just run
grep -w 9966260
against the log file.
Did repeat the above steps, except for:
akash-provider
pod after recovering the manifest;akash-provider
;Logs - The logs of a successful result after send-manifest
hurricane-provider-console-NO-kube-builder-ClusterParams-error.log
Hint: just run
grep -w 9966260
against the log file.
Have occurred again on the Hurricane provider.
err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
was not related to the bad image version.
I've nuked the deployments and have redeployed them from scratch now:
Destroy
GH actionDeploy
GH action (deployments have enough deposit to run for 376 days
)console-proxy (staging & prod)
: added the CORS headers to allow them talk (and hence deploy) to the provider it is deployed at ( https://github.com/akash-network/support/issues/89 )console-proxy (staging & prod)
: patched netpol
, as console-proxy needs to access the provider it is deployed at ( https://github.com/akash-network/support/issues/1 )Issue happens after the provider has been rebuilt from scratch (including the VM's / OS / newer kubespray/K8s)
E[2023-09-28|16:09:37.269] deploying workload module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/12989462/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
E[2023-09-28|16:09:37.269] execution error module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h2adh8s6ptsx33m6hda7p9kahcdwy09dhr5x90/12989462/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud state=deploy-active err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
(I've sent you the provider logs in Slack)
New guess for err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
, potential reason is unattended ubuntu upgrades breaking the K8s cluster.
I'll try to disable them, reboot the worker node and see whether this issue occurs again.
Can't reproduce this manually. Have tried:
image
field and sending the manifest (3 times);image
field and sending the manifest (2 times);image
field and sending the manifest (2 times);I believe the enabled unattended upgrades were the root cause of the issue https://github.com/akash-network/support/issues/131
Neither I've had any reports for this issue. Closing for now.
Re-opening as am still seeing this error (err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
) on Hurricane provider with k8s v1.27.5
(delivered with kubespray v2.23.0
) when sending-manifest to the provider (using the CLI) and that's not limited to the image
update in SDL, but also env
update.
provider-services 0.4.8 (provider & client [CLI]) akash network 0.28.2
Provider logs:
D[2023-11-22|16:45:37.103] running check module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cmp=deployment-monitor attempt=1
I[2023-11-22|16:45:37.135] check result module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cmp=deployment-monitor ok=true attempt=1
I[2023-11-22|16:45:47.516] update received module=provider-manifest cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605 version=C761DDE12EAAD74D36ACD78EB57DFF035836BD85C162E9B1A071B38313D57BEE
D[2023-11-22|16:45:48.377] running check module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cmp=deployment-monitor attempt=1
I[2023-11-22|16:45:48.403] check result module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cmp=deployment-monitor ok=true attempt=1
I[2023-11-22|16:45:54.207] manifest received module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605
I[2023-11-22|16:45:54.210] data received module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605 version=c761dde12eaad74d36acd78eb57dff035836bd85c162e9b1a071b38313d57bee
D[2023-11-22|16:45:54.210] requests valid module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605 num-requests=1
D[2023-11-22|16:45:54.210] publishing manifest received module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605 num-leases=1
D[2023-11-22|16:45:54.210] publishing manifest received for lease module=manifest-manager cmp=provider deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605 lease_id=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk
I[2023-11-22|16:45:54.210] manifest received module=provider-cluster cmp=provider cmp=service lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk
D[2023-11-22|16:45:54.211] shutting down module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cmp=deployment-monitor
D[2023-11-22|16:45:54.211] shutdown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cmp=deployment-monitor
I[2023-11-22|16:45:54.219] hostnames withheld module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud cnt=0
E[2023-11-22|16:45:54.219] deploying workload module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
E[2023-11-22|16:45:54.219] execution error module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud state=deploy-active err="kube-builder: ClusterParams() returned result of unexpected type (%!s(<nil>))"
D[2023-11-22|16:45:54.232] purged ips module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2023-11-22|16:45:54.248] purged hostnames module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2023-11-22|16:45:54.248] teardown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2023-11-22|16:45:54.248] shutting down module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2023-11-22|16:45:54.248] waiting on dm.wg module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
I[2023-11-22|16:45:54.248] shutdown complete module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2023-11-22|16:45:54.248] hostnames released module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
D[2023-11-22|16:45:54.248] sending manager into channel module=provider-cluster cmp=provider cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk manifest-group=dcloud
I[2023-11-22|16:45:54.248] manager done module=provider-cluster cmp=provider cmp=service lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1/akash15tl6v6gd0nte0syyxnv57zmmspgju4c3xfmdhk
D[2023-11-22|16:45:54.248] unreserving capacity module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1
I[2023-11-22|16:45:54.248] attempting to removing reservation module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1
I[2023-11-22|16:45:54.248] removing reservation module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1
I[2023-11-22|16:45:54.248] unreserve capacity complete module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/13697605/1/1
followed-up by https://github.com/akash-network/support/issues/152
provider removed the deployment as it was unable to (re-)deploy it after receiving updated manifest file, leaving bid/lease open/active
provider-services
0.4.6
00:42
provider restarts due toaccount sequence mismatch error
(which is expected mechanism 1 2 to tackle the case when provider would just hang (stop bid on new order requests until manually restarted))00:48
provider fails to update (redeploy) the deployment (send-manifest) (only the image version was bumped, the day before it worked without the issues with the same provider-services v0.4.6)Provider removed the deployment manifest and its namespace after it failed to deploy it upon receiving, leaving bid/lease in the active/open state.
I've manually recovered the lease this way.
Logs