bradenwright / kitchen-lxd_cli

Test Kitchen driver for LXD
Other
10 stars 4 forks source link

Centos support? -- Waiting for /root/.ssh to become available... #4

Closed juju4 closed 8 years ago

juju4 commented 8 years ago

Hello,

I discovered recently your lxd module and it's fantastic to get light testing with kitchen and ansible. I start to leverage travis (see other issue) but centos is not starting correctly

D      Found Ip Address 10.252.116.34
       Setting up public key /home/travis/.ssh/id_rsa.pub on default-centos-7
       Check /root/.ssh on default-centos-7
D      Waiting for /root/.ssh to become available...
D      run_local_command ran: lxc exec default-centos-7 -- ls /root/.ssh > /dev/null 2>&1
D      Command finished: pid 5753 exit 2
D      Waiting for /root/.ssh to become available...
D      run_local_command ran: lxc exec default-centos-7 -- ls /root/.ssh > /dev/null 2>&1
D      Command finished: pid 5766 exit 2

see details on https://travis-ci.org/juju4/ansible-adduser/jobs/148556427 (centos-7) https://travis-ci.org/juju4/ansible-adduser/jobs/148556426 (centos-6)

any hints to make it work ?

Thanks

Note: normally it's master so it's different from https://github.com/bradenwright/kitchen-lxd_cli/issues/2

bradenwright commented 8 years ago

I actually just started messing with this a little a week or 2 ago. At home I basically only use Ubuntu, but at work we have some Centos machines so I was messing around. There's a handful of things to do, I was manually trying to setup an image.

I'm having trouble with my lxd dhcp stuff so I had to setup static networking which it doesn't look like you will need to do. But just for completeness for me for networking I manually setup:

Login via lxc exec <container name> bash 1) /etc/sysconfig/network-scripts/ifcfg-eth0 2) /etc/resolv.confe Essentially make sure you can ping with dns resolution

Once networking is the driver waits for /root/.ssh directory to be created, since ubuntu creates, its a way to wait for startup, b/c I was running into a race condition so to speak where must of the time kitchen was trying to ssh before the container had fully started. So you need to create /root/.ssh

3) lxc exec <container name> -- mkdir /root/.ssh

That will allow you're public key to be setup, however centos-6 image I used (public one), doesn't have ssh server installed.

4) lxc exec <container name> -- yum install -y openssh-server

Make sure it starts, for me it didn't 5) lxc exec <container name> -- service sshd start

Next you'll notice if you try to ssh into the container it tells you, you must update the root password. I didn't know the existing password so I logged into the container and update it.

6) a) lxc exec <container name> bash b) passwd

At this point everything should be set to work.

You can publish your container using:t

7) lxc publish <container name> --alias <image name)

Update your kitchen lxd yml config, specifically image_name:

So I think that should work, I'm definitely open to PR for centos stuff, but don't know with how little I personally use centos if I'll find the time to make changes to this driver to support it without manually setting up the container.

Hope this helps.

juju4 commented 8 years ago

Thanks for the quick reply. The network setup I do is inside the travis configuration mostly using lxd bridge (arbitrary choice of private subnet following https://insights.ubuntu.com/2016/04/07/lxd-networking-lxdbr0-explained/) https://github.com/juju4/ansible-adduser/blob/master/.travis.yml

  - sudo perl -pi -e 's@^LXD_IPV4_ADDR=""@LXD_IPV4_ADDR="10.252.116.1"@;s@^LXD_IPV4_NETMASK=""@LXD_IPV4_NETMASK="255.255.255.0"@;s@^LXD_IPV4_NETWORK=""@LXD_IPV4_NETWORK="10.252.116.1/24"@;s@^LXD_IPV4_DHCP_RANGE=""@LXD_IPV4_DHCP_RANGE="10.252.116.2,10.252.116.254"@;s@^LXD_IPV4_DHCP_MAX=""@LXD_IPV4_DHCP_MAX="252"@;' /etc/default/lxd-bridge
- sudo service lxd restart

From your explanations, it seems you can't use default lxc images from https://images.linuxcontainers.org ? As container is already up (lxc exec ... ls /etc/hosts), I would expect kitchen to push its own ephemeral key like vagrant do, if possible per instance. when I check default ubuntu images, /root/.ssh exists and also authorized_keys but empty, so anyway, you need to push for a ssh key if using ssh as transport.

You are right openssh-server is not installed per default, so either kitchen can do the install, either we can ask images upstream to add it. Hum. it seems to be intentionally removed: https://github.com/lxc/lxc-ci/blob/master/templates/centos.json same for ubuntu but that's not the case of official ubuntu images which are used by default. so former option would probably best. kitchen-lxd_cli to install sshd maybe depending on provisioner (required if ansible)

I also got the issue between lxd and centos dhcp. just calling manually 'dhclient eth0' seems to work for me

If you push a ssh key in authorized_keys, I believe you don't need to care about password locking. not sure which key you are using on ubuntu side.

I manage to get it work with following sequence

    - "lxc image copy images:/centos/7/amd64 local: --alias=centos-7-nossh"

- name: force restart of lxd to have working network
  service: name=lxd state=restarted
  when: lxcdconf.changed

- name: pre-configure lxc images with sshd
  shell: "{{ item }}"
  with_items:
    - "lxc init centos-7-nossh centos-7"
    - "lxc start default-centos-7"
    - "lxc exec default-centos-7 -- dhclient-eth0"
    - "lxc exec default-centos-7 -- yum install -y openssh-server sudo"
    - "lxc exec default-centos-7 -- systemctl enable sshd"
    - "lxc exec default-centos-7 -- systemctl start sshd"
    - "lxc exec default-centos-7 -- mkdir /root/.ssh"
    - "openssl rand -base64 32 | lxc exec default-centos-7 -- passwd root --stdin"
    - "lxc stop default-centos-7 --force"
    - "lxc publish default-centos-7 --alias centos-7"
    - "lxc destroy default-centos-7"

from my jenkins ansible role

also a strange thing: lxc stop on centos instances seems to stall, I had to use --force. same with kitchen test

also at the beginning dns resolution was working fine and after remounting my jenkins server (with ansible of course), not working anymore... trying to put google dns didn't help. Will continue on centos later...

bradenwright commented 8 years ago

I've randomly seen stop hang, I'll probably just update kitchen to use --force flag by default. Glad you were able to get centos working without too much extra work.

juju4 commented 8 years ago

Just for reference, here is a working setup in travis https://github.com/juju4/ansible-adduser/blob/master/.travis.yml https://travis-ci.org/juju4/ansible-adduser/builds/150469632 Centos fails on serverspec script for some reason

In the end, there are 3 network "issues":

$ sudo -E su $USER -c "lxc exec run-${distribution}-${version//./} -- env"
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
container=lxc
http_proxy=http://[fe80::1%eth0]:13128
HOME=/root
TERM=xterm
USER=root

I have no idea where it's coming from but that makes all network call fail...

I pushed a lxd ansible role here: https://github.com/juju4/ansible-lxd

juju4 commented 8 years ago

For reference, some image preconfiguration is detailed in this ansible role https://github.com/juju4/ansible-lxdconfigure/blob/master/defaults/main.yml