chrisroberts / vagabond

Advocating idleness and work-shyness
Other
233 stars 25 forks source link

LXC instances unable to resolve external hosts #19

Open jaypipes opened 11 years ago

jaypipes commented 11 years ago

Hi again Chris, I'm hoping you've run into something like this already and have a quick fix for me.

One of our cookbooks is using the Git Chef resource to download and install a repo from Github. Unfortunately, when the instance tries to execute chef-client, it fails while unable to reach github.com:

10.0.3.11 ================================================================================
10.0.3.11 Error executing action `checkout` on resource 'git[/mnt/git/Diamond]'
10.0.3.11 ================================================================================
10.0.3.11 
10.0.3.11 
10.0.3.11 Mixlib::ShellOut::ShellCommandFailed
10.0.3.11 ------------------------------------
10.0.3.11 Expected process to exit with [0], but received '128'
10.0.3.11 ---- Begin output of git ls-remote https://github.com/BrightcoveOS/Diamond.git v3.3 ----
10.0.3.11 STDOUT: 
10.0.3.11 STDERR: error: Couldn't resolve host 'github.com' while accessing https://github.com/BrightcoveOS/Diamond.git/info/refs
10.0.3.11 fatal: HTTP request failed
10.0.3.11 ---- End output of git ls-remote https://github.com/BrightcoveOS/Diamond.git v3.3 ----
10.0.3.11 Ran git ls-remote https://github.com/BrightcoveOS/Diamond.git v3.3 returned 128
10.0.3.11 
10.0.3.11 
10.0.3.11 Resource Declaration:
10.0.3.11 ---------------------
10.0.3.11 # In /var/chef/cache/cookbooks/diamond/recipes/install.rb
10.0.3.11 
10.0.3.11  52:       git node['diamond']['git_path'] do
10.0.3.11  53:         repository node['diamond']['git_repository_uri']
10.0.3.11  54:         reference node['diamond']['git_reference']
10.0.3.11 
10.0.3.11  55:         action :checkout
10.0.3.11  56:         not_if { ::File.exists?("#{node['diamond']['git_path']}/setup.py") }
10.0.3.11  57:       end
10.0.3.11  58: 
10.0.3.11 
10.0.3.11 
10.0.3.11 
10.0.3.11 
10.0.3.11 Compiled Resource:
10.0.3.11 ------------------
10.0.3.11 # Declared in /var/chef/cache/cookbooks/diamond/recipes/install.rb:52:in `from_file'
10.0.3.11 
10.0.3.11 git("/mnt/git/Diamond") do
10.0.3.11   provider Chef::Provider::Git
10.0.3.11   action [:checkout]
10.0.3.11   retries 0
10.0.3.11   retry_delay 2
10.0.3.11   destination "/mnt/git/Diamond"
10.0.3.11   revision "v3.3"
10.0.3.11   remote "origin"
10.0.3.11   cookbook_name "diamond"
10.0.3.11   recipe_name "install"
10.0.3.11   repository "https://github.com/BrightcoveOS/Diamond.git"
10.0.3.11   not_if { #code block }
10.0.3.11 end
10.0.3.11 
10.0.3.11 
10.0.3.11 
10.0.3.11 Recipe: sysctl::default
10.0.3.11   * template[/etc/sysctl.d/99-chef-attributes.conf] action create
10.0.3.11  (skipped due to only_if)
10.0.3.11 [2013-06-29T17:30:03+00:00] ERROR: Running exception handlers
10.0.3.11 [2013-06-29T17:30:03+00:00] FATAL: Saving node information to /var/chef/cache/failed-run-data.json
10.0.3.11 [2013-06-29T17:30:03+00:00] ERROR: Exception handlers complete
10.0.3.11 Chef Client failed. 5 resources updated
10.0.3.11 [2013-06-29T17:30:03+00:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
10.0.3.11 [2013-06-29T17:30:03+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: git[/mnt/git/Diamond] (diamond::install line 52) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '128'
10.0.3.11 ---- Begin output of git ls-remote https://github.com/BrightcoveOS/Diamond.git v3.3 ----
10.0.3.11 STDOUT: 
10.0.3.11 STDERR: error: Couldn't resolve host 'github.com' while accessing https://github.com/BrightcoveOS/Diamond.git/info/refs
10.0.3.11 fatal: HTTP request failed
10.0.3.11 ---- End output of git ls-remote https://github.com/BrightcoveOS/Diamond.git v3.3 ----
10.0.3.11 Ran git ls-remote https://github.com/BrightcoveOS/Diamond.git v3.3 returned 128
  -> PROVISION FAILED
jpipes@uberbox:~/repos/att-cloud/chef-repo$ bundle exec vagabond ssh ops
Vagabond: SSH connect to: ops
Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.8.0-23-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

1 package can be updated.
0 updates are security updates.

***
Chef-Client - ops
Hostname: ubuntu1204-7Sf951xoOz1i
Chef Server: https://10.0.3.47
Environment: vagabond
Last Run: 2013-06-29 17:30:02 +0000

Roles:
  base_vagabond
  system-tools
  graphed
  audited
***

This is a Vagabond OpenStack system
Last login: Sat Jun 29 17:29:59 2013 from 10.0.3.1
-bash: /root/.bashrc: line 79: unexpected EOF while looking for matching `"'
-bash: /root/.bashrc: line 109: syntax error: unexpected end of file
root@ubuntu1204-7Sf951xoOz1i. 17:30:13:~# cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.1.1
search gateway.2wire.net
root@ubuntu1204-7Sf951xoOz1i. 17:30:19:~# ping gateway.2wire.net -c1
ping: unknown host gateway.2wire.net
root@ubuntu1204-7Sf951xoOz1i. 17:30:23:~# ping -c1 10.0.3.47
PING 10.0.3.47 (10.0.3.47) 56(84) bytes of data.
64 bytes from 10.0.3.47: icmp_req=1 ttl=64 time=0.097 ms

--- 10.0.3.47 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.097/0.097/0.097/0.000 ms

Is there some special sauce I need to put in my Vagabondfile to get external routing working properly? Here is my Vagabondfile, for reference. Thanks in advance for any insights...

{
    :nodes => {
        :ops => {
            :template => "ubuntu_1204",
            :environment => "vagabond",
            :ipaddress => "10.0.3.11",
            :run_list => [
                "role[base_vagabond]"
            ]
        }
    },
    :clusters => {
        :simple => [
            "ops"
        ]
    },
   :local_chef_server => {
       :zero => false,
       :berkshelf => true,
       :librarian => false,
       :enabled => true,
       :auto_upload => true
   },
   :sudo => true
}

BTW, really enjoying working with Vagabond. It's so much nicer than waiting for Vagrant and Virtualbox! :)

chrisroberts commented 11 years ago

If you update the name server within the containers resolve.conf file to 10.0.3.1 does that get things working? If not, what about a direct address like 8.8.8.8? I think this may have to do with dnsmasq integration within network manager on the host. If you find one that works you can update the file in the base container at /var/lib/lxc/ubuntu_1204/rootfs/etc/resolved.conf. After that the newly created nodes will get the updated file and should be okay. Let me know what you find and I'll get an update applied for it. Cheers!

jaypipes commented 11 years ago

Yes, indeed, Chris, that got the ops node "unstuck" :) I manually set the first nameserver to 10.0.3.1 instead of 127.0.0.1 and that fixed things.

Now... how to get LXC base template that Vagabond uses to automatically set this when we do vagabond init? ;)

chrisroberts commented 11 years ago

Okay, so I'm going to bet that it's a NetworkManager issue since I don't see that behavior and have NetworkManager dnsmasq disabled. I will likely provide a bit of logic in the vagabond cookbook for host configuration to inspect the host's resolv.conf entries and make a best guess default, which the Vagabondfile can provide an override if required.

jaypipes commented 11 years ago

OK, cool, thanks Chris. Let me know if I can assist or if you want me to experiment with stuff.