Closed RichardBradley closed 4 years ago
The pinned version X not available
error happens when the resource a pipeline job is pinned to a version that it can't get for some reason.
On your update pipeline is the control-tower-release
resource checking successfully? If you click on it in the UI you should see a list of available versions. We don't pin versions in this pipeline but someone could have pinned it manually. If it is pinned the resource box will be purple in the UI. You should be able to force Concourse to detect new versions with fly -t <target> check-resource -r control-tower-self-update/control-tower-release
.
Thanks for your reply!
Nothing appears to be pinned in the UI and the "control-tower-release
" resource looks good to me.
I will poke about in fly
and see if I can unstick anything.
Here's my "control-tower-release
" resource in the UI:
And here's my "self-update
" job (build?) in the UI:
Self update is successfully discovering all its inputs (see the check mark next to that line in your screenshot). I would guess that the self-update job is paused.
I would guess that the self-update job is paused.
I think it was, thanks!
I ran fly -t xxx unpause-job -j control-tower-self-update/self-update
and things have definitely changed.
I think this is related to https://github.com/concourse/concourse/issues/1915 -- "Paused jobs should indicate that they are, well, paused". After reading that bug I now know how to tell if a job is paused or not. It turns out that a "pause" symbol on a job details page (that page is hidden behind a not obviously clickable job title) means that it is live (you can click to pause), and a "play" symbol means that it is paused (you can click to play). I /think/. Who would have guessed that?
I suppose the self update jobs start paused on a new deployment, so that everyone gets a nice surprise in 3 months (6 months?) when their server stops working as the cert expires? ;-)
Thanks for your help with this! `
It turns out that a "pause" symbol on a job details page (that page is hidden behind a not obviously clickable job title) means that it is live (you can click to pause), and a "play" symbol means that it is paused (you can click to play). I /think/. Who would have guessed that?
Ha - I remember being flummoxed by this particular UI design choice when making a website about 20 years ago that played background MIDI files...
I suppose the self update jobs start paused on a new deployment, so that everyone gets a nice surprise in 3 months (6 months?) when their server stops working as the cert expires? ;-)
We did this to avoid users having downtime that wasn't under their control. Do you think it would be more valuable to have it enabled by default?
The self-update pipeline shouldn't be paused - just the self-update job. The renew-cert job is supposed to trigger every day.
On the UI you can tell if something is paused because it will be light blue.
On the UI you can tell if something is paused because it will be light blue.
As you can see from the above screenshots, the paused job appeared grey in the UI on my instance. I don't know why it was not blue.
I ran the self-update job and it killed the whole instance :-)
I got the following output and the hostname no longer resolves in DNS. I will poke about and see if I can rebuild it and report back here if it is interesting.
waiting for docker to come up...
Pulling engineerbetter/pcf-ops@sha256:7cab6efb45f85bb59eafe31b6107b73e78c668eda857c20cd5326dfca90fcc36...
sha256:7cab6efb45f85bb59eafe31b6107b73e78c668eda857c20cd5326dfca90fcc36: Pulling from engineerbetter/pcf-ops
4d65b6a51407: Pulling fs layer
007bb40a3d29: Pulling fs layer
....
d528521d0fe2: Pull complete
Digest: sha256:7cab6efb45f85bb59eafe31b6107b73e78c668eda857c20cd5326dfca90fcc36
Status: Downloaded newer image for engineerbetter/pcf-ops@sha256:7cab6efb45f85bb59eafe31b6107b73e78c668eda857c20cd5326dfca90fcc36
Successfully pulled engineerbetter/pcf-ops@sha256:7cab6efb45f85bb59eafe31b6107b73e78c668eda857c20cd5326dfca90fcc36.
+ cd control-tower-release
+ chmod +x control-tower-linux-amd64
+ ./control-tower-linux-amd64 deploy ci.xxx
USING PREVIOUS DEPLOYMENT CONFIG
WARNING: adding record ci.xxx to DNS zone ci.xxx with name Z14SSNJQU991QA
aws_iam_user.blobstore: Refreshing state... (ID: control-tower-ci.xxx-eu-west-1-blobstore)
aws_s3_bucket.blobstore: Refreshing state... (ID: control-tower-ci.xxx-eu-west-1-blobstore)
aws_vpc.default: Refreshing state... (ID: vpc-030402d397718204d)
aws_key_pair.default: Refreshing state... (ID: control-tower-ci.xxx20191104001031012700000001)
aws_iam_user.bosh: Refreshing state... (ID: control-tower-ci.xxx-eu-west-1-bosh)
data.aws_availability_zones.available: Refreshing state...
aws_iam_access_key.blobstore: Refreshing state... (ID: AKIAVD4WQFCT5R2NJCGV)
aws_iam_user_policy.bosh: Refreshing state... (ID: control-tower-ci.xxx-eu-west...tower-ci.xxx-eu-west-1-bosh)
aws_iam_access_key.bosh: Refreshing state... (ID: AKIAVD4WQFCT4SL2ZFXD)
aws_iam_user_policy.blobstore: Refreshing state... (ID: control-tower-ci.xxx-eu-west...-ci.xxx-eu-west-1-blobstore)
aws_subnet.public: Refreshing state... (ID: subnet-0f18ca3d8a54bd9e2)
aws_route_table.rds: Refreshing state... (ID: rtb-0eeab1acf99d9a74d)
aws_security_group.rds: Refreshing state... (ID: sg-078bb5da6f4af3c0e)
aws_subnet.rds_a: Refreshing state... (ID: subnet-05ee1739a5c667359)
aws_security_group.vms: Refreshing state... (ID: sg-09ca977c374da12ef)
aws_internet_gateway.default: Refreshing state... (ID: igw-0b187a2dcaddfed38)
aws_subnet.private: Refreshing state... (ID: subnet-0ed40eb9df7900ecd)
aws_subnet.rds_b: Refreshing state... (ID: subnet-0ef7d833f9a51c5f5)
aws_route_table_association.rds_a: Refreshing state... (ID: rtbassoc-091b31a9d74b3c998)
aws_eip.director: Refreshing state... (ID: eipalloc-0aa6085c8905747ca)
aws_eip.atc: Refreshing state... (ID: eipalloc-03400ea3546e80921)
aws_route.internet_access: Refreshing state... (ID: r-rtb-0e7198a8281ed49f71080289494)
aws_eip.nat: Refreshing state... (ID: eipalloc-014f4204a13dbd0fc)
aws_db_subnet_group.default: Refreshing state... (ID: control-tower-ci.xxx)
aws_route_table_association.rds_b: Refreshing state... (ID: rtbassoc-0d61c1f742b3b798f)
aws_nat_gateway.default: Refreshing state... (ID: nat-091321c149c88f2f1)
aws_db_instance.default: Refreshing state... (ID: terraform-20191104001038363100000002)
aws_route53_record.concourse: Refreshing state... (ID: Z14SSNJQU991QA_ci.xxx_A)
aws_security_group.director: Refreshing state... (ID: sg-0a8f426dd911b42f1)
aws_route_table.private: Refreshing state... (ID: rtb-0ca454bef44939b85)
aws_route_table_association.private: Refreshing state... (ID: rtbassoc-0e688e067a960001f)
aws_security_group.atc: Refreshing state... (ID: sg-05389c3dd5a020c21)
aws_route53_record.concourse: Destroying... (ID: Z14SSNJQU991QA_ci.xxx_A)
aws_eip.atc: Modifying... (ID: eipalloc-03400ea3546e80921)
tags.Name: "" => "control-tower-ci.xxx-atc"
tags.name: "control-tower-ci.xxx-atc" => ""
aws_eip.director: Modifying... (ID: eipalloc-0aa6085c8905747ca)
tags.Name: "" => "control-tower-ci.xxx-director"
tags.name: "control-tower-ci.xxx-director" => ""
aws_eip.nat: Modifying... (ID: eipalloc-014f4204a13dbd0fc)
tags.Name: "" => "control-tower-ci.xxx-nat"
tags.name: "control-tower-ci.xxx-nat" => ""
aws_eip.nat: Modifications complete after 0s (ID: eipalloc-014f4204a13dbd0fc)
aws_eip.director: Modifications complete after 0s (ID: eipalloc-0aa6085c8905747ca)
aws_eip.atc: Modifications complete after 0s (ID: eipalloc-03400ea3546e80921)
aws_nat_gateway.default: Modifying... (ID: nat-091321c149c88f2f1)
tags.Name: "" => "control-tower-ci.xxx"
tags.name: "control-tower-ci.xxx" => ""
aws_nat_gateway.default: Modifications complete after 0s (ID: nat-091321c149c88f2f1)
aws_route53_record.concourse: Still destroying... (ID: Z14SSNJQU991QA_ci.xxx_A, 10s elapsed)
aws_route53_record.concourse: Still destroying... (ID: Z14SSNJQU991QA_ci.xxx_A, 20s elapsed)
aws_route53_record.concourse: Still destroying... (ID: Z14SSNJQU991QA_ci.xxx_A, 30s elapsed)
aws_route53_record.concourse: Still destroying... (ID: Z14SSNJQU991QA_ci.xxx_A, 40s elapsed)
aws_route53_record.concourse: Destruction complete after 48s
Error: Error applying plan:
1 error(s) occurred:
* aws_route53_record.concourse: aws_route53_record.concourse: diffs didn't match during apply. This is a bug with Terraform and should be reported as a GitHub Issue.
Please include the following information in your report:
Terraform Version: 0.11.11
Resource ID: aws_route53_record.concourse
Mismatch reason: attribute mismatch: name
Diff One (usually from plan): *terraform.InstanceDiff{mu:sync.Mutex{state:0, sema:0x0}, Attributes:map[string]*terraform.ResourceAttrDiff{"zone_id":*terraform.ResourceAttrDiff{Old:"Z14SSNJQU991QA", New:"Z14SSNJQU991QA", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "records.#":*terraform.ResourceAttrDiff{Old:"1", New:"1", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "records.4246710445":*terraform.ResourceAttrDiff{Old:"63.35.140.121", New:"63.35.140.121", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "name":*terraform.ResourceAttrDiff{Old:"ci.xxx", New:"", NewComputed:false, NewRemoved:false, NewExtra:"", RequiresNew:true, Sensitive:false, Type:0x0}, "fqdn":*terraform.ResourceAttrDiff{Old:"ci.xxx", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "type":*terraform.ResourceAttrDiff{Old:"A", New:"A", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "ttl":*terraform.ResourceAttrDiff{Old:"60", New:"60", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "allow_overwrite":*terraform.ResourceAttrDiff{Old:"true", New:"true", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}}, Destroy:false, DestroyDeposed:false, DestroyTainted:false, Meta:map[string]interface {}(nil)}
Diff Two (usually from apply): *terraform.InstanceDiff{mu:sync.Mutex{state:0, sema:0x0}, Attributes:map[string]*terraform.ResourceAttrDiff{"records.#":*terraform.ResourceAttrDiff{Old:"", New:"1", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "records.4246710445":*terraform.ResourceAttrDiff{Old:"", New:"63.35.140.121", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "type":*terraform.ResourceAttrDiff{Old:"", New:"A", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "allow_overwrite":*terraform.ResourceAttrDiff{Old:"", New:"true", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "fqdn":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "zone_id":*terraform.ResourceAttrDiff{Old:"", New:"Z14SSNJQU991QA", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:true, Sensitive:false, Type:0x0}, "ttl":*terraform.ResourceAttrDiff{Old:"", New:"60", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}}, Destroy:false, DestroyDeposed:false, DestroyTainted:false, Meta:map[string]interface {}(nil)}
Also include as much context as you can about your config, state, and the steps you performed to trigger this error.
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
exit status 1
I was planning to rebuild this instance in a different account soon anyway. Maybe that will be easier than fixing this one now. I will remember to turn on the auto-update on the new instance.
Fixed now.
I reached the server by IP address (adding the hostname to my hosts
file to trick HTTPS) and was able to re-run the self-update job, which worked the second time.
Then GitHub auth failed, but re-applying the same team permissions seemed to fix that.
Thanks for your help with this
The HTTPS certificate on my control-tower Concourse UI is expired. It looks like there is a job which ought to auto-renew this called "renew-https-cert", but it is hanging with the following output:
Can anyone help me understand what's wrong here? What does the "pinned version is not available" message mean? How can I fix?
After I fix that, will the permanently spinning task "discovering any new versions of control-tower-release" unblock, or do I have two problems?
Thanks!
Rich