Closed jabl closed 7 years ago
A facts issue. Pull doesn't gather facts and can't figure out the IP of the admin node from hostvars.
We could update default variables so that ntp_config_server looks like:
ntp_config_server: [ "{{ kickstart_server_ip }}", "{{ central_log_host|replace('@', '') }}" ]
Testing that out.
TASK [ansible-role-chrony : update chrony.conf from template] ****************** Friday 27 January 2017 10:51:06 +0200 (0:00:04.993) 0:00:25.845 ******** fatal: [io1]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: {{ ntp_config_server }}: [u\"{ { hostvars[groups['install'][0]]['ansible_hostname'] }}\", u\"{{ hostvars[groups['admin'][0]]['ansible_hostname'] }}\"]: 'di ct object' has no attribute 'ansible_hostname'"}
Similar issue with slurm_service_node variable as that's also from facts by default.
https://github.com/CSC-IT-Center-for-Science/ansible-role-fgci-install/pull/32
Okay after changing fgci-install role a few times the above ansible-pull doesn't fail anymore. The central_log_host|replace is a bit hacky but we don't have admin group's hostvars in the install.yml playbook to use, nor do we have the admin node's IP manually entered into the group_vars. Added some documentation and reasoning..
How does it look?
Some thoughts:
Less clean than I had hoped for, but I guess it's unavoidable?
Another way would be to have a central store of the facts that the nodes can query when they run ansible-pull.
I haven't looked into this or tested it but seems cleaner and could perhaps also provide some performance boost: http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching
Using NFS (facts in json files in a directory) would be convenient but we need the variables available before we run ansible (when NFS is not available) so we'd need to run another service (like redis) on the install node with a persistent store of the facts. There are also some notes in the docs about
I noticed that some of our settings weren't applied, in particular on compute nodes one of the ntp servers is set to 10.1.1.1 which is incorrect for us. Turns out that this comes from pull_variables.yml which overrides the settings from group_vars.
It seems to me that except for setting running_as_ansible_pull and slurm_munge_key_from_nfs to True, pull_variables.yml doesn't do anything useful. E.g. where to pull from is already hardcoded in ansible-pull-script which syncs the group_vars where everything else is defined.