Closed hidalgopl closed 3 years ago
FYI
I think that the issue we're hitting is related with volume size of the AMI. We started seeing it after we resized attached volume to 500Gb. After my changes from this PR, kip gets to the point when it sends RunInstances
request, but instance creation fails and dedicated host almost immediately turns its state to Pending.
As you see, there's Client.InvalidParameterCombination: Could not create volume with size 500GiB from snapshot 'snap-064acde5cd376cb6b'
. I've added logging block devices volume size to debug if correct numbers are passed to RunInstances
function call.
kip-provider-0 kip I0127 17:45:33.909398 1 instances.go:493] Starting instance for node: &{{Node v1} {26b5749b-a378-4a77-9ece-05314c889606 map[] 2021-01-27 17:45:33.90770785 +0000 UTC <nil> map[] 08358c1e-5551-4f58-be7a-c4119665a587 default} {mac1.metal ami-0929815870cdeaa46 false false true { 20G false <nil> false <nil>}} {Creating [] default_buildkite-agent-mac1-metal}}
kip-provider-0 kip I0127 17:45:34.209321 1 instances.go:504] calculated volume size for node: 500
kip-provider-0 kip I0127 17:45:35.209502 1 instances.go:480] checking host h-020fba8d6384e6cc9 availability...
kip-provider-0 kip I0127 17:45:35.281221 1 instances.go:514] Starting node with security groups: [sg-0ee086454488a7451] subnet: 'subnet-2769f140'
kip-provider-0 kip I0127 17:45:35.281250 1 instances.go:516] Block devices for a node
kip-provider-0 kip I0127 17:45:35.281257 1 instances.go:518] Device: /dev/sda1 volume size: 824658735208
kip-provider-0 kip I0127 17:45:36.707965 1 instances.go:552] Started instance: i-01b2507ae043e4275
kip-provider-0 kip I0127 17:45:51.843890 1 instances.go:642] retrying err: ResourceNotReady: failed waiting for successful resource state
there's probably an issue with the volume size, as you may see from logs
Ok, so this PR fixes following issues:
gp2
. I've added getting volume type from AMI's volumes and using it.
There are two issues that this should fix:
state
. Should be fixed now.I'm still getting
from
e.client.WaitUntilInstanceRunning(dii)
. I tried to increase retry timeout, but it didn't help.