Closed ghost closed 8 years ago
Apparently, VMs using this stemcell are failing to boot, at least is us-east-1
https://cloudfoundry.slack.com/archives/bosh/p1467923069001921
@jmcarp and I observed that the System Log for the VMs aren't even displayed in the AWS Console; the VM shuts down and terminates before anything is logged.
I manually replaced our existing stemcell 3262.2 (built with Jenkins) with one that Pivotal built with concourse, and now bosh staging deploys.
I see some differences between the two stemcells and am working with the Pivotal guys to figure out what’s up.
Production bosh updated too.
Looking at the debug logs, I found the issue to be that bosh was using an invalid, locally-created ami (ami-0766de10) instead of the one specified in the bosh.io 3262.2 stemcell (ami-613ef00c).
Note the Description
below:
AMI used when VM creation was failing:
aws> ec2 describe-images --filters "Name=image-id,Values=ami-0766de10"
{
"Images": [
{
"VirtualizationType": "him",
"Name": "BOSH-9f931b47-049e-40ab-ba9a-a93081e952ce",
"Hypervisor": "xen",
"ImageId": "ami-0766de10",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"SnapshotId": "snap-ec63530a",
"VolumeSize": 3,
"VolumeType": "standard",
"Encrypted": false
}
}
],
"Architecture": "x86_64",
"ImageLocation": "144433228153/BOSH-9f931b47-049e-40ab-ba9a-a93081e952ce",
"RootDeviceType": "abs",
"OwnerId": "144433228153",
"RootDeviceName": "/dev/xvda",
"CreationDate": "2016-06-30T19:37:18.000Z",
"Public": true,
"ImageType": "machine",
"Description": "Cloud.gov Bosh Lite Stemcell"
}
]
}
AMI specified by the bosh.io 3262.2 stemcell:
aws> ec2 describe-images --filters "Name=image-id,Values=ami-613ef00c"
{
"Images": [
{
"VirtualizationType": "him",
"Name": "BOSH-cefe0a60-8cc1-4a64-8666-f9441518cea3",
"Hypervisor": "xen",
"SriovNetSupport": "simple",
"ImageId": "ami-613ef00c",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"SnapshotId": "snap-ff3df4eb",
"VolumeSize": 3,
"VolumeType": "standard",
"Encrypted": false
}
},
{
"DeviceName": "/dev/sdb",
"VirtualName": "ephemeral0"
}
],
"Architecture": "x86_64",
"ImageLocation": "138312800438/BOSH-cefe0a60-8cc1-4a64-8666-f9441518cea3",
"RootDeviceType": "abs",
"OwnerId": "138312800438",
"RootDeviceName": "/dev/xvda",
"CreationDate": "2016-06-28T23:41:03.000Z",
"Public": true,
"ImageType": "machine",
"Description": "bosh-aws-xen-ubuntu-trusty-go_agent 3262.2"
}
]
}
The AMI id was found in the bosh task nnnn --debug
logs.
This issue was due to cg-aws-light-stemcell-builder being broken. In addition, the deployed deploy-bosh
pipeline is pulling from bosh.io, rather than pulling from our S3 bucket, the stemcell output location from cg-aws-light-stemcell-builder .
When the pipeline was updated to pull from our S3 bucket, concourse reported "error running command: missing path in request”. This was apparently due to worker caching and the resolution was to rename the resource.
According to this post in Slack in the cloudfoundry #bosh channel, possible solutions were to either rename the resource or recreate the worker:
https://cloudfoundry.slack.com/archives/bosh/p1467926131001983
PR against upstream here: https://github.com/cloudfoundry-incubator/aws-light-stemcell-builder/pull/4
Determine why new builds are failing. Differences noted by
bosh deploy
:Old versions: BOSH: 257.1 stemcell: bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3262 bosh-aws-cpi: 53
New versions: BOSH: 257.3 stemcell: bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3262.2 bosh-aws-cpi: 54