fgci-org / fgci-ansible

:microscope: Collection of the Finnish Grid and Cloud Infrastructure Ansible playbooks
MIT License
55 stars 17 forks source link

Ansible-pull role repository cache #73

Closed mhakala closed 8 years ago

mhakala commented 8 years ago

Currenlty, when executing ansible-pull-script.sh this fetches the whole repository of roles with "git clone". This has drawback of being slow, flooding github (especially if there is no cache). Can this behaviour be replaces. E.g. generate the latest repository as tar.gz or provide some cached location for entire FGCI consortium.

martbhell commented 8 years ago

Maybe something like https://github.com/beefsack/git-mirror could be used to:

Specific suggestions to reduce load on github are most welcome.

mhakala commented 8 years ago

Why not, can be tested. If it works then why not.

martbhell commented 8 years ago

If we would use git mirrors then in the requirements.yml files we can change the URL to http://{{ install_node }}/gitmirror/account/ansible-role-nhc or some such

jabl commented 8 years ago

Started work on this at https://github.com/jabl/ansible-role-gitmirror (just the bare skeleton yet).

jabl commented 8 years ago

Well, https://github.com/jabl/ansible-role-gitmirror should now work based on some limited testing. It downloads the git-mirror release tarball, unpacks it and copies the binary to a suitable location, creates a systemd unit file, creates a user&group for running the daemon, a git-mirror config file containing repos to mirror and then finally starts the whole shebang.

The remaining issue is how to do the deployment/bootstrapping in a sensible way. Initially, we need the current requirements.yml in order to pull in the required repos (including andible-role-gitmirror!). But then after the install node is installed, we want to switch to using the mirror (assuming we want to run the mirror on the install node). So maybe we need to create some requirements_mirror.yml or something like that? In the worst case we'd have to keep the repos we use up to date in 3 places, requirements.yml, requirements_mirror.yml, and in the group_vars for ansible-role-gitmirror to use when generating the git-mirror config file. Ideally we should somehow keep the list of repos we want in a single location, but how to do that in a good way? Any suggestions?

mhakala commented 8 years ago

One solution would be to just keep the repos in a single location for compute nodes. That is group_vars. 1) ansible-playbook install.yml could get the repos from here and setup mirror on install based on this information 2) ansible-playbook install.yml could also write /var/www/html/requirements.yml (to be used with ansible pull) based on the repos in group_vars 3) finally ansible-pull-scrips.sh could be modified to use the requirements.yml on install node.

The other nodes (admin and install) will need a separete file to bootstrap all.

2016-02-27 22:40 GMT+02:00 Janne Blomqvist notifications@github.com:

Well, https://github.com/jabl/ansible-role-gitmirror should now work based on some limited testing. It downloads the git-mirror release tarball, unpacks it and copies the binary to a suitable location, creates a systemd unit file, creates a user&group for running the daemon, a git-mirror config file containing repos to mirror and then finally starts the whole shebang.

The remaining issue is how to do the deployment/bootstrapping in a sensible way. Initially, we need the current requirements.yml in order to pull in the required repos (including andible-role-gitmirror!). But then after the install node is installed, we want to switch to using the mirror (assuming we want to run the mirror on the install node). So maybe we need to create some requirements_mirror.yml or something like that? In the worst case we'd have to keep the repos we use up to date in 3 places, requirements.yml, requirements_mirror.yml, and in the group_vars for ansible-role-gitmirror to use when generating the git-mirror config file. Ideally we should somehow keep the list of repos we want in a single location, but how to do that in a good way? Any suggestions?

— Reply to this email directly or view it on GitHub https://github.com/CSC-IT-Center-for-Science/fgci-ansible/issues/73#issuecomment-189720316 .

Mikko Hakala mikko.h.hakala@gmail.com 045 - 678 9757

martbhell commented 8 years ago

Sounds good. I was toying with the idea of creating the vars for the mirroring by parsing the requirements.yml but it seems difficult.

So two lists: requirements.yml and group_vars?

Is it possible to set version(tag/commit) on each repo? ansible galaxy takes care of that.

martbhell commented 8 years ago

Pushed a suggestion to a branch in the fgci-install role: https://github.com/CSC-IT-Center-for-Science/ansible-role-fgci-install/commit/e263f9c3826872852139588fd5fc7c2b6a287f27

Thoughts?

It copies requirements.yml to the install node and then replaces all instances of https://github.com with http://pull_install_ip. Top of the requirements.yml looks like below. I guess we could change defaults so that they look in http://10.1.1.2/gitmirror/ ?

/edit: Just now noticed that git-mirror serves the mirrors too - it runs its own web server. I guess we could change httpd.conf on install to proxy the traffic or just point clients to http://10.1.1.2:8080/

Now to make the mirrors, it would be really nice one could modify requirements.yml into a gitmirror config file. Any ideas for how to do that?

---
- src: http://10.1.1.2/resmo/ansible-role-ntp
  path: roles
  version: 0.4.0

- src: http://10.1.1.2/CSC-IT-Center-for-Science/ansible-role-fgci-install
  path: roles

# called -2 because it replaces another role called ansible-role-yum-cron
- src: http://10.1.1.2/jeffwidman/ansible-yum-cron
  path: roles
  name: ansible-role-yum-cron-2
  version: 9d587da913eaa82349e86b4fb9d691818538963b
jabl commented 8 years ago

Yeah, git-mirror runs its own web server (on port 8080 by default). Another thing with the urls is that it encodes the hostname in the path. So e.g. http://github.com/foo/bar.git becomes http://{{ install_ip }}:8080/github.com/foo/bar.git, although this can be changed in the config file (see the "name" directive).

And, it should be possible to parse the requirements.yml and then generate a git-mirror config.toml as well as the requirements_mirror.yml from that. If nothing else, there's the python yaml parser which I guess ansible itself uses. But I have no idea how baroque the yaml parsing api is..

Another idea would be to have another role (be it ansible-role-fgci-install or e.g. ansible-role-gitmirror-fgciconfig or such) that would define the repos in defaults/main.yml and then from that it should be easy to generate the requirements.yml that uses the mirror with a jinja2 template. A question though, can one role pickup defaults from another role? That is, if we have the mirrors defined in ansible-role-gitmirror-fgciconfig, will ansible-role-gitmirror pick them up and generate a config file with all the repos or will it use its own example config from its own defaults/main.yml? Using group_vars for this isn't that good because then we'd have to rely on every fgci site doing changes themselves or then stuff will mysteriously start breaking..

Edit Ok, so basic usage of PyYAML is pretty simple. Lets see if I can cook something up..

martbhell commented 8 years ago

Roles in the same playbook can use eachother's default variables.

jabl commented 8 years ago

Ok, these 2 pulls should do it:

https://github.com/CSC-IT-Center-for-Science/ansible-role-fgci-install/pull/5 https://github.com/CSC-IT-Center-for-Science/fgci-ansible/pull/101

Everything is sort of tested individually, but not together so there might be some more-or-less trivial bugs left.

martbhell commented 8 years ago

Things are merged. Is the ansible pull traffic now only internal?

martbhell commented 8 years ago

After this change it looks like ansible-pull script is calling 10.1.1.2 and grabbing roles from there so that looks good. I had to run "ansible-playbook install.yml -t gitmirror,fgci-install" to get all the updates applied and after an ansible-pull run or two things looks quite nice.

Big thanks to everybody involved!

jabl commented 8 years ago

All the git cloning/pulling etc., yes. Though there is still some connecting to the external world, check e.g. with

strace -econnect -f /usr/local/bin/ansible-pull-script.sh 2>&1 |grep connect|grep -v AF_LOCAL|grep -v 10.10.254.20

(replace 10.10.254.20 with your pull_install_ip).

That being said, ansible-pull-script now runs a lot faster than before, since the cloning everything from github was really slow, so I'm not sure it's worth spending a lot of time on chasing what's leftover.

Edit: Seems the culprit is the line

/usr/bin/ansible-pull -s $time -U http://10.10.254.20:8080/github.com/CSC-IT-Center-for-Science/fgci-ansible.git -C production -i /root/hosts

in ansible-pull-script.