debops / ansible-pki

Bootstrap and manage internal PKI, Certificate Authorities and OpenSSL/GnuTLS certificates
GNU General Public License v3.0
65 stars 29 forks source link

Role debops.pki/env fails to resolve after Ansible 2.4 #127

Open jjzazuet opened 5 years ago

jjzazuet commented 5 years ago

Hi. Like the title says. I was previously using the reference playbook to perform PKI certificate creation and exchange in a 4 node cluster. So I'm pretty sure my playbook used to work. I currently tried running it under both Ansible 2.4 and Ansible 2.6 on both MacOS and Debian 9 stretch. Both fail with the same error message:

bash-3.2$ ansible-playbook site-pki.yml --ask-vault-pass
Vault password:
ERROR! the role 'debops.pki/env' was not found in /Users/jjzazuet/code/gopher/devops/roles:/Users/jjzazuet/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/Users/jjzazuet/code/gopher/devops

The error appears to have been in '/Users/jjzazuet/code/gopher/devops/site-pki.yml': line 4, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  roles:
    - role: debops.pki/env
      ^ here

bash-3.2$ 

And under Debian:

root@ny-mightygopher:~/gopher/devops# ansible-playbook site-pki.yml --ask-vault-pass
Vault password:
ERROR! the role 'debops.pki/env' was not found in /root/gopher/devops/roles:/root/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/root/gopher/devops

The error appears to have been in '/root/gopher/devops/site-pki.yml': line 4, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  roles:
    - role: debops.pki/env
      ^ here

root@mightygopher:~/qnoa/devops#

Upon further inspection of the role's code, it appears as if the env subrole is using symlinks to point back to shared parent role file resources (but I'm not fully certain). I also see that the role's codebase is a ew years old in general, so it's also possible that a new Ansible release broke the role's resource resolution strategy.

Any help or feedback is appreciated. Thanks for the awesome framework! 👍

drybjed commented 5 years ago

Hello,

I'm afraid that I'm not sure why this happens. The role hasn't been changed much, besides this seems to be an issue with finding the roles themselves, ie. a problem with Ansible configuration. Since you are using a custom playbook, I assume that you installed the roles "manually" somewhere, is the main role named debops.pki? Can you show the playbook that you are using and the contents of the roles_path variable in ansible.cfg? Check if the role is in one of the directories listed there.

The development of DebOps codebase has shifted to a monorepo, you might look into it to get the latest changes. The standalone roles will at some point be archived on GitHub.

jjzazuet commented 5 years ago

Hi @drybjed , thanks for the tip. Yes, these are the commands I'm using on the Debian deployment box to prepare the environment to execute Ansible playbooks:

apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 93C4A3FD7BB9C367
echo 'deb http://ppa.launchpad.net/ansible/ansible/ubuntu trusty main' | tee /etc/apt/sources.list.d/ansible.list
apt-get update -y; apt-get install -y git ansible python-pip

pip install netaddr

ansible-galaxy install esolitos.resolv;
ansible-galaxy install holms.fqdn;
ansible-galaxy install debops.secret;
ansible-galaxy install debops.pki;
ansible-galaxy install debops.grub;
ansible-galaxy install debops-contrib.apparmor;
ansible-galaxy install dev-sec.ssh-hardening;

The Ansible Galaxy dependencies do indeed list debops.pki as a top level role. Again, I do remember these roles were enough to bootstrap PKI certificates throughout the cluster. I believe the path locations Ansible is using for role location are:

/root/gopher/devops/roles
/root/.ansible/roles
/usr/share/ansible/roles
/etc/ansible/roles
/root/gopher/devops

On my Mac, the roles as installed by Galaxy are here:

bash-3.2$ ls -la ~/.ansible/roles/
total 0
drwxr-xr-x   9 jjzazuet  staff  288 Aug 26 17:10 .
drwx------   5 jjzazuet  staff  160 Aug 26 17:13 ..
drwxr-xr-x  16 jjzazuet  staff  512 Aug 26 17:10 debops-contrib.apparmor
drwxr-xr-x  14 jjzazuet  staff  448 Aug 26 17:10 debops.grub
drwxr-xr-x  14 jjzazuet  staff  448 Aug 26 17:10 debops.pki
drwxr-xr-x  14 jjzazuet  staff  448 Aug 26 17:10 debops.secret
drwxr-xr-x  22 jjzazuet  staff  704 Aug 26 17:10 dev-sec.ssh-hardening
drwxr-xr-x   9 jjzazuet  staff  288 Aug 26 17:10 esolitos.resolv
drwxr-xr-x  13 jjzazuet  staff  416 Aug 26 17:10 holms.fqdn
bash-3.2$

And the contents of debops.pki

bash-3.2$ ls -la ~/.ansible/roles/debops.pki/
total 8
drwxr-xr-x  14 jjzazuet  staff  448 Aug 26 17:10 .
drwxr-xr-x   9 jjzazuet  staff  288 Aug 26 17:10 ..
-rw-rw-r--   1 jjzazuet  staff  803 Aug  6 07:10 COPYRIGHT
drwxr-xr-x   9 jjzazuet  staff  288 Aug 26 17:10 _cacher_ng
drwxr-xr-x   7 jjzazuet  staff  224 Aug 26 17:10 _install
drwxr-xr-x   7 jjzazuet  staff  224 Aug 26 17:10 _listchanges
drwxr-xr-x   8 jjzazuet  staff  256 Aug 26 17:10 _mark
drwxr-xr-x   7 jjzazuet  staff  224 Aug 26 17:10 _preferences
drwxr-xr-x   8 jjzazuet  staff  256 Aug 26 17:10 _proxy
drwxr-xr-x  29 jjzazuet  staff  928 Aug 26 17:10 debops-0.8.0
drwxr-xr-x   3 jjzazuet  staff   96 Aug 26 17:10 defaults
drwxr-xr-x   4 jjzazuet  staff  128 Aug 26 17:10 meta
drwxr-xr-x   3 jjzazuet  staff   96 Aug 26 17:10 tasks
drwxr-xr-x   4 jjzazuet  staff  128 Aug 26 17:10 templates
bash-3.2$

It makes sense that Ansible wouldn't be able to find the env subrole in that structure, so I'm just wondering if something changed in the way Galaxy is installing this role.

I'd be happy to try and migrate to the mono repo version of debops since I haven't released my production infrastructure yet.

Thanks again for the help!

drybjed commented 5 years ago

That explains everything, thanks. In essence, Ansible Galaxy backend and handling of roles has changed some time ago to enable support for multi-role repositories, among other things. I played with supporting the new Galaxy a bit in the DebOps monorepo, but the current state of how ansible-galaxy or mazer install it doesn't look very promising. It looks like a few of the DebOps roles like debops.apt_* are installed in a broken state, then the DebOps monorepo is included in a weird way... No idea what to do about it.

It's especially puzzling for me, because I only messed around with the DebOps monorepo in the Galaxy database, and I left the older, separate role repositories intact. No idea why, since you install specifically debops.pki, the monorepo along with the debops.apt_* roles gets pulled as well. Perhaps @chouseknecht would be interested about this.

For now, I would suggest that you avoid using ansible-galaxy or mazer to install DebOps roles and/or monorepo. I just tried installing the monorepo directly via the repository URL but ansible-galaxy failed - although it might be due to an old version. There are a few other ways to handle the installation, you could clone the monorepo directly to ~/.local/share/debops/debops/ and add that path to the roles_path variable, or you could install DebOps via pip install debops, the Python package contains a snapshot of the DebOps roles at a specific tag - this might be handy if you want to stick to stable releases. Otherwise, after installing the debops Python package you can run debops-update to get the latest changes in the monorepo. Check the installation instructions for more details.

jjzazuet commented 5 years ago

I'll give the pip install path a try. Will report back when updated. Thanks!

jjzazuet commented 5 years ago

Ok I just pip installed the monorepo, and I find that the following folders were correctly installed on my Mac.

local-dev00:/ jjzazuet$ ls -la ./usr/local/lib/python2.7/site-packages/debops/ansible/roles/debops.pki
total 8
drwxr-xr-x   10 jjzazuet  staff   320 Aug 27 21:59 .
drwxr-xr-x  154 jjzazuet  staff  4928 Aug 27 21:59 ..
-rw-r--r--    1 jjzazuet  staff   785 Aug 27 21:58 COPYRIGHT
drwxr-xr-x    3 jjzazuet  staff    96 Aug 27 21:59 defaults
drwxr-xr-x    5 jjzazuet  staff   160 Aug 27 21:59 env
drwxr-xr-x    4 jjzazuet  staff   128 Aug 27 21:59 files
drwxr-xr-x    3 jjzazuet  staff    96 Aug 27 21:59 handlers
drwxr-xr-x    3 jjzazuet  staff    96 Aug 27 21:59 meta
drwxr-xr-x    6 jjzazuet  staff   192 Aug 27 21:59 tasks
drwxr-xr-x    3 jjzazuet  staff    96 Aug 27 21:59 templates
local-dev00:/ jjzazuet$

Should I now tell Ansible to include the role's mono-repo path via the roles_path variable? Or should it be able to locate the mono-repo on its own?

Thanks again!

drybjed commented 5 years ago

Yes, when you add /usr/local/lib/python2.7/site-packages/debops/ansible/roles/ path to roles_path, Ansible should be able to find the roles there.

jjzazuet commented 5 years ago

@drybjed ok so I managed to get Ansible to source the debops playbooks from the additional install path, but I now seem to be running into the same issue as https://github.com/debops/debops-tools/issues/117 :(

fatal: [ny-api00]: FAILED! => {"msg": "lookup plugin (task_src) not found"}

I also tried adding .debops.cfg at the root of my playbook hierarchy as:

[paths]
data-home: /usr/local/lib/python2.7/dist-packages/debops

My apologies, I'm running out of ideas as to what I could be doing wrong. Any advice is appreciated. Thanks!

jjzazuet commented 5 years ago

Ok this seems to have done the trick:

lookup_plugins=/usr/local/lib/python2.7/dist-packages/debops/ansible/playbooks/lookup_plugins

Sigh... next time I'll need to think twice before running my playbooks with a later version of Ansible. So I guess I should freeze the version at 2.6.

Thanks again for the help!

jjzazuet commented 5 years ago

Ah, my apologies, as I've just stumbled upon a new error while running the pki role. Apparently, an intermediate step to the role fails due to some kind of network error. In this example, I have four hosts doing the PKI certificate exchange, and in subsequent runs, any other pair might fail with no apparent reason. Here's the failing step's output:

TASK [debops.pki : Upload internal certificate requests] *************************************************************************************************
failed: [ste-api02] (item={u'subject_alt_names': [u'dns:ste-api02.gopher.io', u'dns:localhost', u'ip:108.61.41.194', u'ip:10.0.0.182', u'ip:127.0.0.1'], u'name': u'gopher.io', u'acme': False, u'subject': [u'cn=node']}) => {"changed": false, "checksum": "003eb9351c451307c9716ab35cef9ae78560c3ec", "dest": "/etc/vault/./api/pki/requests/domain/gopher.io/gopher.io/request.pem", "file": "/etc/pki/realms/gopher.io/internal/request.pem", "item": {"acme": false, "name": "gopher.io", "subject": ["cn=node"], "subject_alt_names": ["dns:ste-api02.gopher.io", "dns:localhost", "ip:108.61.41.194", "ip:10.0.0.182", "ip:127.0.0.1"]}, "md5sum": "714cf8916ad9da5d0e9827ca33f6d340", "msg": "checksum mismatch", "remote_checksum": "7a7abe9289fc2813cc755ae894a68cd2ba45250a", "remote_md5sum": null}
changed: [ste-api00] => (item={u'subject_alt_names': [u'dns:ste-api00.gopher.io', u'dns:localhost', u'ip:104.243.38.42', u'ip:10.0.0.180', u'ip:127.0.0.1'], u'name': u'gopher.io', u'acme': False, u'subject': [u'cn=node']})
failed: [ste-api01] (item={u'subject_alt_names': [u'dns:ste-api01.gopher.io', u'dns:localhost', u'ip:209.222.98.74', u'ip:10.0.0.181', u'ip:127.0.0.1'], u'name': u'gopher.io', u'acme': False, u'subject': [u'cn=node']}) => {"changed": false, "checksum": "003eb9351c451307c9716ab35cef9ae78560c3ec", "dest": "/etc/vault/./api/pki/requests/domain/gopher.io/gopher.io/request.pem", "file": "/etc/pki/realms/gopher.io/internal/request.pem", "item": {"acme": false, "name": "gopher.io", "subject": ["cn=node"], "subject_alt_names": ["dns:ste-api01.gopher.io", "dns:localhost", "ip:209.222.98.74", "ip:10.0.0.181", "ip:127.0.0.1"]}, "md5sum": "714cf8916ad9da5d0e9827ca33f6d340", "msg": "checksum mismatch", "remote_checksum": "e6ec61446b5cc627ec057fd061021f5a0bf80f75", "remote_md5sum": null}
changed: [ste-bld00] => (item={u'subject_alt_names': [u'dns:ste-bld00.gopher.io', u'dns:localhost', u'ip:216.155.144.90', u'ip:10.0.0.100', u'ip:127.0.0.1'], u'name': u'gopher.io', u'acme': False, u'subject': [u'cn=node']})

Any feedback or help is appreciated. Thanks!

drybjed commented 5 years ago

Hmm, I'm not sure what might the cause here. You could try by clearing up the secret/pki/requests/ directory on the Ansible Controller, see if that changes anything. If you are using a custom Ansible playbook, can you show it?