chef-boneyard / chef-provisioning-aws

AWS driver and resources for Chef that uses the AWS SDK
Apache License 2.0
142 stars 122 forks source link

Random error "ECONNREFUSED" when provisioning a CentOs machine #566

Closed karim-jaouadi closed 6 years ago

karim-jaouadi commented 6 years ago

Hi,

Summary: This issue is random, but while provisioning a CentOs server on AWS (didn't tried on other Unix OS yet, but not happening with Windows), I have an error "Connection refused - connect(2) for SERVER_IP" is raised at converge stage when trying to connect via SSH to bootstrap the server. It seems that my provisioning node tries to bootstrap a bit too early in the process (SSH port not yet open on newly provisioned server).

When: First provisioning of a CentOs machine

Reproducibility: Random (+/- once every 6 times)

If:

  1. I re-run the recipe, I don't get the error
  2. I terminate the instance, clean it from chef and re-run the recipe, I don't get the error.

Error log:

- waiting for machine_name (i-025792d47a2ffd392 on aws::eu-west-1) to become ready ...
- been waiting 0/500 -- sleeping 10 seconds for machine_name (i-025792d47a2ffd392 on aws::eu-west-1) to become ready ...
- been waiting 10/500 -- sleeping 10 seconds for machine_name (i-025792d47a2ffd392 on aws::eu-west-1) to become ready ...
- been waiting 20/500 -- sleeping 10 seconds for machine_name (i-025792d47a2ffd392 on aws::eu-west-1) to become ready ...
- update node machine_name at https://chef-server.sdt-poc.com/organizations/sdt-dev
-   update run_list from ["recipe[chef-client]"] to ["chef-client"
[2018-03-14T18:11:19+00:00] ERROR: Unable to download /etc/chef/client.pem to /tmp/client.pem.834874636 on centos@SERVER_IP -- Connection refused - connect(2) for SERVER_IP:22
[2018-03-14T18:11:19+00:00] WARN: Unable to clean up /tmp/client.pem.834874636 on centos@SERVER_IP -- Connection refused - connect(2) for SERVER_IP:22

- generate private key (2048 bits)
================================================================================
Error executing action `converge` on resource 'machine[machine_name]'
================================================================================

Errno::ECONNREFUSED
-------------------
Connection refused - connect(2) for SERVER_IP:22

Resource Declaration:
---------------------
# In /var/chef/cache/cookbooks/provisioning_aws/recipes/createStack.rb

370:       machine machine_name do
371:         machine_options(merged_template)
372:         aws_tags machine_tags
373:         run_list [ 'chef-client' ]
374:         action :converge
375:       end
376: 

Question/Workaround: Is there a way to debug this or put a timer before the bootstrap is kicked?

Thanks

karim-jaouadi commented 6 years ago

Seems to be a clone of #559 which has been fixed on v3.0.2 (https://github.com/chef/chef-provisioning-aws/pull/564)