Closed mhakala closed 8 years ago
After this change it seems that copy the host keys to install node ssh_host_keys dir fails in ansible push mode (tried also with nfs.yml playbook):
$ ansible-playbook compute.yml -t host-keys -CD -e install_ssh_host_keys_to_nfs=True -e generate_ssh_known_hosts=True -l io1
and it fails with this:
TASK [ansible-role-sshd-host-keys : copy the host keys to the install node ssh_host_keys_dir] *** fatal: [io1]: FAILED! => {"failed": true, "msg": "the file_name '/home/admin/ansible/io/fgci-ansible/files/nodes/'[u'io1', u'io2', u'io3', u'io4', u'gpu']'/ssh/ssh_host_ed25519_key' does not exist, or is not readable"}
/home/admin/ansible/io/fgci-ansible/files/nodes/io1/ssh/ssh_host_ed25519 does exist
Yep, this turned out to be a bit more complicate with "with_items". From jinja2 output it does not read the output as an array but instead as a string. I guess you can sack the PR for now.
As a better alternative we could separate the functionality altogether. E.g. there is a host-keys-generate role that would only be run when adding nodes. Compute nodes would only fetch per node keys+known_hosts.
Yes, a new playbook for adding adding or deleting compute nodes would be useful. Maybe one role is better. There are other things we do that we only do when creating a new host (DNS, pdsh, etc). Do those other tasks also take long for you - maybe those could be taken out as well. Or is it just that this host-keys-generate take so much longer than those other ones?
Feature request/issue for creating a delete host playbook: https://github.com/CSC-IT-Center-for-Science/fgci-ansible/issues/124
Closing this PR.
I've now merged the adding the profile_task callback plugin so one can see how long a task takes. For our test system with only 5 nodes, the tasks in sshd-host-keys role is not in the top 20 in ansible-pull (for an already installed node). In total for 5 nodes that role (in one ansible-pull run) takes 4.446 seconds and 4.471 on the first ansible-pull run after a reinstall.
How long does it take for you?
Maybe this can help (no warranty on this one-liner :)
for i in $(grep sshd-host-key -A1 /var/log/ansible.log |grep $(date +%A)|cut -d "(" -f2|cut -d ")" -f1|cut -d "." -f2); do summa=$(($summa+$(echo $i|sed -e 's/^0*//'))); echo "aggregated time in ms for sshd-host-keys: $summa"; done
In normal case the copying of keys to nfs is usually skipped (ansible-pull). Yet, ansible iterates over all the groups.compute elements that is really slow if there are multiple nodes. Fixed this by setting the set to a single node when the copying is skipped. Will significantly speed up the ansible-pull.