Closed Jinkxed closed 7 years ago
Also having this issue
The issue seems to be with an updated aws-sdk-core gem having conflicts with an older one.
Removing the old aws-sdk-core gem seems to resolve it.
That's a very odd error message and fix combination.
Can you please provide as much information as possible about your hosts (OS, Kernel Version, etc.)? It would very much help is in root causing this as fast as possible.
Thanks, -Asaf
Latest Amazon Linux AMI. Latest kernel versions.
Installed via the puppet module: https://github.com/walkamongus/puppet-codedeploy/
Had the aws-sdk-core gem locked to version 2.2.3 as any version later than that seem to break codedeploy-agent. This was done awhile back and part of fixing the current issue was removing this lock.
Performing these steps resolved the issue:
Removed the puppet code locking the aws-sdk-gem for the host OS to 2.2.3.
cd /opt/codedeploy-agent
bundle install
/etc/init.d/codedeploy-agent restart
Prior puppet code that I was using to lock the host os gem:
# This is a pain in the butt way to ensure the aws-sdk-core doesn't have the newest 2.3.0
# gem which is incompatible with codedeploy-agent and strait up breaks it.
exec { 'remove-aws-sdk-core-incompatible-versions':
command => 'gem uninstall aws-sdk-core -aqIx',
unless => 'test `gem list --local | grep -q "aws-sdk-core (2.2.37)"; echo $?` -ne 1',
path => ['/usr/bin', '/bin'],
}
package { 'aws-sdk-core':
ensure => '2.2.37',
provider => 'gem',
require => Exec['remove-aws-sdk-core-incompatible-versions']
}
Which version of ruby did you have installed? What version of the gem was incompatible? Is 2.2.37 the compatible one or the incompatible one?
Any chance you can help by providing steps to reproduce this error? I've been unable to reproduce it but I would really love to fix this permanently so it doesn't bother you or other customers again.
I'd love to but will be on Thursday. I'm heading out of town for the 4th.
I'll try and get everything together for you when I get back.
Thank you, have a good vacation and happy 4th
Also now or when you get back if you could try running the command ls /opt/codedeploy-agent/vendor/gems/
and tell me what you see as the output.
Also I wonder if you knew what the output of gem list
was when the error occurred. Since it's fixed now it isn't as relevant but it might hold a clue as to why the errors are occurring. Maybe the way we're using bundler, which we added to resolve all these different dependency issues, is doing something weird when certain combinations of gems are installed.
Thankfully it should be like this on our previous image, so should be easy to recreate. I'll check when I get back.
Any updates on the fix for this issue? Many customers have been affected over the holiday weekend.
I also ran into this issue last week and reverted to the previous version of the code deploy agent for now.
We ran into this issue as well. It looks like the bad version of the agent was yanked? Newly spun up instances are on agent v1.0-1.1106 and can't find an update to install, whereas our broken instances were using agent v1.0-1.1225.
AWS has acknowledged the issue on the status page
Starting at 9:30 AM PDT June 30th AWS CodeDeploy released an update to the AWS CodeDeploy agent that prevents agents on some Linux instances from polling for commands in the US-EAST-1 region. The AWS CodeDeploy team is actively working on a fix that will eliminate the need for manual actions on your behalf. In the meantime, if you are experiencing deployment timeout errors, you can fix the issue by uninstalling the CodeDeploy agent and then reinstalling the CodeDeploy agent.
If you're running codedeploy-agent-1.0-1.1225, you can also fix the issue by running the following command:
sudo service codedeploy-agent restart
We will post an update as soon as the issue has been resolved.
AWS guys sorted the issue. I just successfully deployed our API to production.
We released a fix 7/5 that should have resolved this issue for all customers. You shouldn't need to take any action as the auto-update feature in the agent should automatically get the latest version.
So out of nowhere today we started getting failed deployments on the vast majority of our servers.
Looking at the instances and the codedeploy agent I see:
The codedeploy agent is dead. Manually going into the directory and doing a
bundle install
and rebooting the machine fixes it. But rebooting our entire AWS account is kinda a pain in the butt.When I restart the agent after running bundle I see access denied messages until the box is rebooted. Anyway to fix atleast this part without rebooting?