chef-boneyard / chef-init

PID1 for your Chef containers
21 stars 4 forks source link

Do services defined in first-boot.json get started even if chef-init can't connect to chef server? #28

Open snayakie opened 9 years ago

snayakie commented 9 years ago

On startup of a chef-container based docker container, suppose chef-init can't connect to chef server for some reason. So chef-init shows errors and will exit. Do the services defined in first-boot.json still get started?

tduffield commented 9 years ago

chef-init does not connect to the chef server - the chef-client does. If you are running a container based on an image that has already been converged then chef-init will run runsvdir which will run the services.

snayakie commented 9 years ago

Let me explain the scenario.

I built an image in one environment, and the containers from that work fine in that environment.

I now take and deploy that image in a different environment, and run container, but the chef-server that's baked in the image can't be contacted for some reason, so chef-client gives errors in the container logs, as follows:

docker logs liberty2

[2014-10-24T10:09:28-04:00] INFO: Starting Supervisor... [2014-10-24T10:09:28-04:00] INFO: Supervisor pid: 7 [2014-10-24T10:09:33-04:00] INFO: Starting chef-client run... [2014-10-24T10:09:34-04:00] INFO: Forking chef instance to converge... [2014-10-24T10:09:34-04:00] WARN:


SSL validation of HTTPS requests is disabled. HTTPS connections are still encrypted, but chef is not able to detect forged replies or man in the middle attacks.

To fix this issue add an entry like this to your configuration file:

  # Verify all HTTPS connections (recommended)
  ssl_verify_mode :verify_peer

  # OR, Verify only connections to chef-server
  verify_api_cert true

To check your SSL configuration, or troubleshoot errors, you can use the knife ssl check command like so:

  knife ssl check -c /etc/chef/client.rb

[2014-10-24T10:09:34-04:00] INFO: * Chef 11.16.2 * [2014-10-24T10:09:34-04:00] INFO: Chef-client pid: 16 [2014-10-24T10:09:43-04:00] INFO: Client key /etc/chef/secure/client.pem is not present - registering

Chef encountered an error attempting to create the client "445fb46619ae"

Network Error:

There was a network error connecting to the Chef Server: Error connecting to https://9.162.124.161/organizations/naapm/clients - Connection timed out - connect(2)

Relevant Config Settings:

chef_server_url "https://9.162.124.161/organizations/naapm"

If your chef_server_url is correct, your network could be down.

[2014-10-24T10:10:46-04:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out [2014-10-24T10:10:46-04:00] ERROR: Error connecting to https://9.162.124.161/organizations/naapm/clients - Connection timed out - connect(2) [2014-10-24T10:10:46-04:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

So that's fine, expected.

But I don't see the service started, which should be getting started successfully. The first-boot.json has the following:

docker-enter liberty2

cd /etc/chef/first-boot.json

bash: cd: /etc/chef/first-boot.json: Not a directory bash-4.1# cat /etc/chef/first-boot.json { "run_list": [ "recipe[wlp_tnpfa::JSPExamples]" ], "container_service": { "wlp-JSPExamples": { "command": "/etc/init.d/wlp-JSPExamples start" } } }

From your answer above, it appears that chef-init should start the services. But with above first-boot.json, that server is not getting started. I don't see anything in the container log about whether chef-init ran runsvdir.

Where is it going wrong?

tduffield commented 9 years ago

chef-init ALWAYS runs runsvidr. That is the Supervisor that is being referenced. The chef-client run is failing because of http://docs.getchef.com/containers.html#credential-management. The service you outline in the first-boot.json will only start automatically if the associated service resource in the recipe has an action that includes :enable.

snayakie commented 9 years ago

Oh yeah, the chef client error I have simulated myself. That's not the problem. So the supervisor being started is the first line of log. Is there any log of that ? My service is not started after this in this container anyway. So I need to debug that. Some runit log?

snayakie commented 9 years ago

Looking into this more, it appears there might be some bug here, for this case when chef-client has some problem connecting up with chef serfer. If you can please help check this.

The chef recipe /var/chef/cache/cookbooks/wlp/providers/server.rb being referenced in the messages below is this :

recipe_fragment

The messages from "knife docker build. ...." are these: . . . [2014-10-28T11:54:42-04:00] INFO: Processing wlp_server[JSPExamples] action start (wlp_tnpfa::JSPExamples line 19) [2014-10-28T11:54:42-04:00] INFO: Provider for service[wlp-JSPExamples] has been replaced with Chef::Provider::ContainerService::Runit [2014-10-28T11:54:42-04:00] INFO: Provider for service[wlp-JSPExamples] has been replaced with Chef::Provider::ContainerService::Runit [2014-10-28T11:54:42-04:00] WARN: Cloning resource attributes for service[wlp-JSPExamples] from prior resource (CHEF-3694) [2014-10-28T11:54:42-04:00] WARN: Previous service[wlp-JSPExamples]: /var/chef/cache/cookbooks/wlp/providers/server.rb:76:in block in class_from_file' [2014-10-28T11:54:42-04:00] WARN: Current service[wlp-JSPExamples]: /var/chef/cache/cookbooks/wlp/providers/server.rb:99:inblock in class_from_file' [2014-10-28T11:54:42-04:00] INFO: Processing service[wlp-JSPExamples] action nothing (/var/chef/cache/cookbooks/wlp/providers/server.rb line 76) [2014-10-28T11:54:42-04:00] INFO: Processing template[/etc/init.d/wlp-JSPExamples] action create (/var/chef/cache/cookbooks/wlp/providers/server.rb line 83) [2014-10-28T11:54:42-04:00] INFO: template[/etc/init.d/wlp-JSPExamples] backed up to /var/chef/backup/etc/init.d/wlp-JSPExamples.chef-20141028115442.928959 [2014-10-28T11:54:42-04:00] INFO: template[/etc/init.d/wlp-JSPExamples] updated file contents /etc/init.d/wlp-JSPExamples [2014-10-28T11:54:42-04:00] INFO: Processing service[wlp-JSPExamples] action enable (/var/chef/cache/cookbooks/wlp/providers/server.rb line 99) [2014-10-28T11:54:42-04:00] INFO: Provider for service[wlp-JSPExamples] has been replaced with Chef::Provider::ContainerService::Runit [2014-10-28T11:54:42-04:00] INFO: Processing directory[/opt/chef/sv/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:42-04:00] INFO: directory[/opt/chef/sv/wlp-JSPExamples] created directory /opt/chef/sv/wlp-JSPExamples [2014-10-28T11:54:42-04:00] INFO: directory[/opt/chef/sv/wlp-JSPExamples] mode changed to 755 [2014-10-28T11:54:42-04:00] INFO: Processing directory[/opt/chef/service] action create (dynamically defined) [2014-10-28T11:54:42-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/down] action create (dynamically defined) [2014-10-28T11:54:42-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/down] created file /opt/chef/sv/wlp-JSPExamples/down [2014-10-28T11:54:42-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/run] action create (dynamically defined) [2014-10-28T11:54:42-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/run] created file /opt/chef/sv/wlp-JSPExamples/run [2014-10-28T11:54:42-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/run] updated file contents /opt/chef/sv/wlp-JSPExamples/run [2014-10-28T11:54:42-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/run] mode changed to 755 [2014-10-28T11:54:42-04:00] INFO: Processing directory[/var/log/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:42-04:00] INFO: directory[/var/log/wlp-JSPExamples] created directory /var/log/wlp-JSPExamples [2014-10-28T11:54:42-04:00] INFO: directory[/var/log/wlp-JSPExamples] mode changed to 755 [2014-10-28T11:54:42-04:00] INFO: Processing directory[/opt/chef/sv/wlp-JSPExamples/log] action create (dynamically defined) [2014-10-28T11:54:42-04:00] INFO: directory[/opt/chef/sv/wlp-JSPExamples/log] created directory /opt/chef/sv/wlp-JSPExamples/log [2014-10-28T11:54:42-04:00] INFO: directory[/opt/chef/sv/wlp-JSPExamples/log] mode changed to 755 [2014-10-28T11:54:42-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/log/run] action create (dynamically defined) [2014-10-28T11:54:42-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/log/run] created file /opt/chef/sv/wlp-JSPExamples/log/run [2014-10-28T11:54:42-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/log/run] updated file contents /opt/chef/sv/wlp-JSPExamples/log/run [2014-10-28T11:54:42-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/log/run] mode changed to 755 [2014-10-28T11:54:42-04:00] INFO: Processing link[/opt/chef/service/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:42-04:00] INFO: link[/opt/chef/service/wlp-JSPExamples] created [2014-10-28T11:54:43-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/down] action delete (dynamically defined) [2014-10-28T11:54:43-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/down] deleted file at /opt/chef/sv/wlp-JSPExamples/down [2014-10-28T11:54:43-04:00] INFO: service[wlp-JSPExamples] enabled [2014-10-28T11:54:43-04:00] INFO: Processing service[wlp-JSPExamples] action start (/var/chef/cache/cookbooks/wlp/providers/server.rb line 99) [2014-10-28T11:54:43-04:00] INFO: Provider for service[wlp-JSPExamples] has been replaced with Chef::Provider::ContainerService::Runit [2014-10-28T11:54:43-04:00] INFO: Processing directory[/opt/chef/sv/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:43-04:00] INFO: Processing directory[/opt/chef/service] action create (dynamically defined) [2014-10-28T11:54:43-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/down] action create (dynamically defined) [2014-10-28T11:54:43-04:00] INFO: file[/opt/chef/sv/wlp-JSPExamples/down] created file /opt/chef/sv/wlp-JSPExamples/down [2014-10-28T11:54:43-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/run] action create (dynamically defined) [2014-10-28T11:54:43-04:00] INFO: Processing directory[/var/log/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:43-04:00] INFO: Processing directory[/opt/chef/sv/wlp-JSPExamples/log] action create (dynamically defined) [2014-10-28T11:54:43-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/log/run] action create (dynamically defined) [2014-10-28T11:54:43-04:00] INFO: Processing link[/opt/chef/service/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: service[wlp-JSPExamples] started [2014-10-28T11:54:46-04:00] INFO: template[/etc/init.d/wlp-JSPExamples] sending restart action to servicewlp-JSPExamples [2014-10-28T11:54:46-04:00] INFO: Processing service[wlp-JSPExamples] action restart (/var/chef/cache/cookbooks/wlp/providers/server.rb line 99) [2014-10-28T11:54:46-04:00] INFO: Provider for service[wlp-JSPExamples] has been replaced with Chef::Provider::ContainerService::Runit [2014-10-28T11:54:46-04:00] INFO: Processing directory[/opt/chef/sv/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: Processing directory[/opt/chef/service] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/down] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/run] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: Processing directory[/var/log/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: Processing directory[/opt/chef/sv/wlp-JSPExamples/log] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: Processing file[/opt/chef/sv/wlp-JSPExamples/log/run] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: Processing link[/opt/chef/service/wlp-JSPExamples] action create (dynamically defined) [2014-10-28T11:54:46-04:00] INFO: service[wlp-JSPExamples] restarted [2014-10-28T11:54:47-04:00] INFO: Chef Run complete in 5.304871547 seconds [2014-10-28T11:54:47-04:00] INFO: Running report handlers [2014-10-28T11:54:47-04:00] INFO: Report handlers complete [2014-10-28T11:54:47-04:00] INFO: Deleting client key... ---> cecedbe10ef2 Removing intermediate container 0e839783d707 Step 5 : ENTRYPOINT ["chef-init"] ---> Running in 67ed177d1887 ---> 3ba2781b8bf4 Removing intermediate container 67ed177d1887 Step 6 : CMD ["--onboot"] ---> Running in 08b26e315d91 ---> f419ffa93885 Removing intermediate container 08b26e315d91 Successfully built f419ffa93885 WARNING: Deleted node chef-liberty-build

WARNING: Deleted client chef-liberty-build

first-boot.json is like this: [root@a15b8f43d73e chef]# cat /etc/chef/first-boot.json { "run_list": [ "recipe[wlp_tnpfa::JSPExamples]" ], "container_service": { "wlp-JSPExamples": { "command": "/opt/was/liberty/wlp/bin/server start JSPExamples" } } }

Then I run a container, in which chef-client fails. But the above service is not started by runit.

In this container, I've checked /etc/init.d/wlp-JSPExamples exists. /opt/chef/sv and /opt/chef/service are like this:

[root@a15b8f43d73e ~]# cd /opt/chef [root@a15b8f43d73e chef]# ls -l sv total 4 drwxr-xr-x 4 root root 4096 Oct 28 11:54 wlp-JSPExamples [root@a15b8f43d73e chef]# ls -l service total 0 lrwxrwxrwx 1 root root 28 Oct 28 11:54 wlp-JSPExamples -> /opt/chef/sv/wlp-JSPExamples

/opt/chef/sv/wlp-JSPExamples appears to have correct stuff.

"ps -ef " shows that it should be started, but isn't.

[root@a15b8f43d73e chef]# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 19:09 ? 00:00:00 /opt/chef/embedded/bin/ruby /usr/bin/chef-init --onboot root 11 1 0 19:09 ? 00:00:00 /opt/chef/embedded/bin/runsvdir -P /opt/chef/service 'log:......................................................................................................................... root 14 11 0 19:09 ? 00:00:00 runsv wlp-JSPExamples root 15 14 0 19:09 ? 00:00:00 svlogd -tt /var/log/wlp-JSPExamples root 176 0 0 19:10 ? 00:00:00 su - root root 177 176 0 19:10 ? 00:00:00 -bash root 195 177 0 19:13 ? 00:00:00 ps -ef [root@a15b8f43d73e chef]#

So I tried to run runsvdir myself, and it gives this message:

[root@a15b8f43d73e chef]# /opt/chef/embedded/bin/runsvdir -P /opt/chef/service runsvdir /opt/chef/service: fatal: unable to start runsv wlp-JSPExamples: file does not exist runsvdir /opt/chef/service: fatal: unable to start runsv wlp-JSPExamples: file does not exist runsvdir /opt/chef/service: fatal: unable to start runsv wlp-JSPExamples: file does not exist . . .

Would you know why this message is coming? All directories look ok.

But if I run "/opt/chef/embedded/bin/sv start wlp-JSPExamples", it starts up fine. So that suggests that runit is setup correctly, but still it doesn't start from runsvdir as above, and also on container startup.