Open tomyam1 opened 7 years ago
Our agent doesn't support rebooting an instance and resuming an in-progress deployment after the reboot. What's your use case for doing this?
The heavy lifting of our deploy process is done using an ansible playbook. Part of that playbook is to update all the system packages. As a result, sometimes after the playbook has finished, there is a need for a reboot.
@tomyam1 adding it as a feature request for now .
IIS website will mount FSx as a virtual directory FSx needs Active Directory. Active Directory needs a reboot.
Adding this would simplify a lot of items that force a reboot during installation.
Trying to reboot the instance the instance during deployment causes an error:
Script - scripts/reboot.sh [stderr]+ reboot [stderr] [stderr]Session terminated, terminating shell... ...terminated.
It'd be nice to have a way to tell the CodeDeploy that the instance is going to reboot and it has to wait for it to come up, or is there already a method for doing that?
Encountered a similar issue today. Would be actually useful to have something like this.
I got hit by this gem today as well. It would be good to have it recover.
My use case :
Windows/IIS environment requires IIS, AD, CodeDeploy, CloudWatch, and various other security/monitoring agents to be installed and configured on boot - requiring either one or two reboots (I know, windows ◔◔ ). When our ASG scales up, the deployment begins as soon as the agent registers itself, which may be before AD registration is complete. In this case, deployments fail with no output and no error message - just the step they were at when the reboot happened.
If it's too hard to continue the deployment after a reboot, another option would be to add a message like "codedeploy agent did not respond" so that it's clear the problem was with agent/instance rather than the appspec script.
We're looking into this to provide support like this and general patching of instances.
@philstrong this is particularly an issue with the deb package not honouring policy-rc.d (see #44 and #107) during an instance refresh. Our ansible config tries to stop and disable it before the deployment kicks in but it's not always successful - especially on larger instances. It seemed worse today with our first instance refresh after the recent 1.3.2 release - maybe it's booting quicker with the SDK v3?
I get the comments in those issues about wanting to provide a system agnostic option to the installer about not starting on installation but it seems even without that, the postinst script should honour policy-rc.d anyway? Having looked at the postinst script it seems an easy enough change:
if systemctl >>/dev/null 2>/dev/null; then
systemctl enable codedeploy-agent
systemctl start codedeploy-agent
else
update-rc.d codedeploy-agent defaults
service codedeploy-agent start-no-update
fi
Instead of systemctl and service the block should use deb-systemd-invoke and invoke-rc.d like this:
if systemctl >>/dev/null 2>/dev/null; then
deb-systemd-invoke enable codedeploy-agent
deb-systemd-invoke start codedeploy-agent
else
update-rc.d codedeploy-agent defaults
invoke-rc.d codedeploy-agent start-no-update
fi
Happy to help test this if you need someone with the appropriate setup.
Any progress on this?
@nicholas78719 it appears not as yet again I had deployment failures during an instance refresh this morning due to the agent starting the deploy automatically after it was installed.
Yet again I've had a deployment fail this morning during an instance refresh purely because I can't stop the agent quickly enough - I get that you want to implement a platform agnostic solution but even with that the deb package should honour policy-rc.d so I don't understand what the blocker is here?
More instance refresh fails today as the latest focal AMI needs a reboot after installing updates - again why can't the deb packaging honour policy-rc.d? This should be the case irrespective of whether a feature for rebooting the instance during deployment is developed or not.
Again I'm asking why can't the deb packaging honour policy-rc.d - this is unrelated to rebooting the instance with a running agent, just that installing the agent doesn't automatically start it. As it stands this morning there are updates published that require a restart but there is no updated AMI yet:
This means that until a new AMI is published, scaling or refreshing instances has a high probability of failure if we try to reboot them after the cloud-init process is complete as it's a race between stopping the agent after it's automatically started and the agent triggering the deploy. The only alternative is to not reboot them after the cloud-init process is complete and then manually reboot them once the deploy is complete.
I'm really at a loss to understand why a tiny change can't be made to the deb packaging to honour the conventions - can you please enlighten me? 🙏🏻
For anyone else struggling with this, here's a set of ansible steps that downloads the deb package, unpacks it, patches it and then repackages it for installation so that it can be installed during cloud-init without automatically starting:
- name: Check if CodeDeploy Agent is installed
shell: dpkg-query -W -f='${Status}' codedeploy-agent | grep 'install ok installed'
register: codedeploy_installed
failed_when: no
changed_when: no
- name: Fetch CodeDeploy Agent
command: creates=/tmp/codedeploy-agent_all.deb aws s3 cp --region {{ aws_region }} s3://aws-codedeploy-{{ aws_region }}/latest/codedeploy-agent_all.deb /tmp/codedeploy-agent_all.deb
when: codedeploy_installed.rc != 0
- name: Remove any existing working directory
file:
path: /tmp/codedeploy-agent
state: absent
when: codedeploy_installed.rc != 0
- name: Extract CodeDeploy Agent package
command: dpkg-deb -x /tmp/codedeploy-agent_all.deb /tmp/codedeploy-agent
when: codedeploy_installed.rc != 0
- name: Extract CodeDeploy Agent control files
command: dpkg-deb --control /tmp/codedeploy-agent_all.deb /tmp/DEBIAN
when: codedeploy_installed.rc != 0
- name: Replace systemctl enable line
lineinfile:
path: "/tmp/DEBIAN/postinst"
regexp: '^ systemctl enable codedeploy-agent$'
line: ' deb-systemd-invoke enable codedeploy-agent'
when: codedeploy_installed.rc != 0
- name: Replace systemctl start line
lineinfile:
path: "/tmp/DEBIAN/postinst"
regexp: '^ systemctl start codedeploy-agent$'
line: ' deb-systemd-invoke start codedeploy-agent'
when: codedeploy_installed.rc != 0
- name: Replace service start line
lineinfile:
path: "/tmp/DEBIAN/postinst"
regexp: '^ service codedeploy-agent start-no-update$'
line: ' invoke-rc.d codedeploy-agent start-no-update'
when: codedeploy_installed.rc != 0
- name: Move control files inside package directory
command: mv /tmp/DEBIAN /tmp/codedeploy-agent
when: codedeploy_installed.rc != 0
- name: Build new package file
command: dpkg -b /tmp/codedeploy-agent /tmp/codedeploy-agent-patched_all.deb
when: codedeploy_installed.rc != 0
- name: Install patched CodeDeploy Agent package
apt:
deb: "/tmp/codedeploy-agent-patched_all.deb"
when: codedeploy_installed.rc != 0
- name: Enable the CodeDeploy Agent
service: name=codedeploy-agent state=stopped enabled=yes
when: codedeploy_installed.rc != 0
Our userdata bash script then ends with the following:
# Reboot if required, otherwise start codedeploy-agent
if [[ -e /var/run/reboot-required ]]; then
reboot
else
service codedeploy-agent start
fi
This either reboots the instance after which the agent will start automatically or manually start the agent if there's no reboot required.
Hope this is of help to someone.
I am also looking for a reboot feature.
Trying to reboot the instance the instance during deployment causes an error:
It'd be nice to have a way to tell the CodeDeploy that the instance is going to reboot and it has to wait for it to come up, or is there already a method for doing that?