Get rid of pull_variables.yml?

fgci-org / fgci-ansible

:microscope: Collection of the Finnish Grid and Cloud Infrastructure Ansible playbooks

MIT License

54 stars 18 forks source link

Get rid of pull_variables.yml? #182

Closed jabl closed 7 years ago

jabl commented 7 years ago

I noticed that some of our settings weren't applied, in particular on compute nodes one of the ntp servers is set to 10.1.1.1 which is incorrect for us. Turns out that this comes from pull_variables.yml which overrides the settings from group_vars.

It seems to me that except for setting running_as_ansible_pull and slurm_munge_key_from_nfs to True, pull_variables.yml doesn't do anything useful. E.g. where to pull from is already hardcoded in ansible-pull-script which syncs the group_vars where everything else is defined.

martbhell commented 7 years ago

A facts issue. Pull doesn't gather facts and can't figure out the IP of the admin node from hostvars.

We could update default variables so that ntp_config_server looks like:

ntp_config_server: [ "{{ kickstart_server_ip }}", "{{ central_log_host|replace('@', '') }}" ]

Testing that out.

We don't have the admin node's IP in a variable in the examples/group_vars.

TASK [ansible-role-chrony : update chrony.conf from template] ******************
Friday 27 January 2017  10:51:06 +0200 (0:00:04.993)       0:00:25.845 ******** 
fatal: [io1]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: {{ ntp_config_server }}: [u\"{
{ hostvars[groups['install'][0]]['ansible_hostname'] }}\", u\"{{ hostvars[groups['admin'][0]]['ansible_hostname'] }}\"]: 'di
ct object' has no attribute 'ansible_hostname'"}

martbhell commented 7 years ago

Similar issue with slurm_service_node variable as that's also from facts by default.

https://github.com/CSC-IT-Center-for-Science/ansible-role-fgci-install/pull/32

martbhell commented 7 years ago

Okay after changing fgci-install role a few times the above ansible-pull doesn't fail anymore. The central_log_host|replace is a bit hacky but we don't have admin group's hostvars in the install.yml playbook to use, nor do we have the admin node's IP manually entered into the group_vars. Added some documentation and reasoning..

How does it look?

Some thoughts:

one could perhaps have multiple install nodes and want to use all as ntp servers.
this assumes that the install node is the slurm service node

jabl commented 7 years ago

Less clean than I had hoped for, but I guess it's unavoidable?

martbhell commented 7 years ago

Another way would be to have a central store of the facts that the nodes can query when they run ansible-pull.

I haven't looked into this or tested it but seems cleaner and could perhaps also provide some performance boost: http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching

Using NFS (facts in json files in a directory) would be convenient but we need the variables available before we run ansible (when NFS is not available) so we'd need to run another service (like redis) on the install node with a persistent store of the facts. There are also some notes in the docs about

redis client in EPEL being too old (also true for EL7?)
beta-status
can't use password while connecting to redis