CCI-MOC / m2

Bare Metal Imaging (Malleable Metal as a Service)
18 stars 16 forks source link

Ansible Playbook for BMI Installation #153

Closed djfinn14 closed 6 years ago

apoorvemohan commented 6 years ago

@pgrosu Could you please review this one?

radonm commented 6 years ago

Readme needs update - The Bare Metal Imaging (BMI) is a core component of the Massachusetts Open Cloud - it is mass open cloud now...

pgrosu commented 6 years ago

Hi Dan,

This is a great start, and if you prefer you can send my requests via email in private. So as this not part of Travis could you please provide me with the two Ubuntu and CentOS preconfiguration environments for both - these can be VMs on a specific deployment - and the step-by-step details on how the settings for all the necessary configurations and minimum version restrictions. Will the Yaml files (main.yml, site.yml, etc) run as is without any changes? Where was this tested on? For which environment were the DHCP ranges created? Either some step-by-step documentation or information would be needed for me to add to the appropriate configuration entries of the UAT framework for each of these test scenarios. As Rado indicated if we look at the README file, statements like the following don't give me confidence in what I should do:

  1. Modify bmi_config.cfg to match whatever your current HIL and Ceph setup is.

  2. Modify dnsmasq.conf within roles/dhcp/tasks/main.yml to match your requirements.

  3. Comment out any of the roles you don't want run in site.yml

( The above was taken from: https://github.com/djfinn14/ims/blob/0eb117f424bc94e86cefbd16dc1dd9aa69aa41f9/scripts/install/production/README.md )

I am happy to test, but I would like some documentation similar to how I provided with my manuals to perform validations for deployment. I'm not trying to be a pain, but I'm swamped and would not like to start guessing. We need to maintain the nice predictability we initiated this summer, where we were only document-driven. In fact, if there is no clear documentation we should not not accept PRs.

I attached the two BMI manuals as a guide and reference:

Thanks, Paul

djfinn14 commented 6 years ago

@pgrosu I sent you an email with this but also want to have it here:

Could you please provide me with the two Ubuntu and CentOS preconfiguration environments for both - these can be VMs on a specific deployment - and the step-by-step details on how the settings for all the necessary configurations and minimum version restrictions

I am not entirely sure what you are want from me here. You need a clean VM (CentOS, RHEL or Ubuntu) that is set up to communicate to a Ceph cluster and HIL. I personally tested it in PRB by geting a clone of the bmi-dev vm, doing my best to wipe all of the packages and bmi setup within it, and then running the playbook.

Will the Yaml files (main.yml, site.yml, etc) run as is without any changes?

Yes, the YAML files can run without any changes, but like is stated you will want to make changes to the files I recommended so that it installs correctly according to your environment.

Where was this tested on? For which environment were the DHCP ranges created?

I first tested this on my kumo VMs. I had a CentOS and Ubuntu VM that I could rebuild. Those tests just were to make sure things like the tgt and dnsmaq services were getting started. You can technically run the dev install scripts for Ceph and HIL and then copy bmi_config.cfg.test into bmi_config.cfg and run the Ansible playbook if you want to have a self contained "toy" setup to see how it runs. My real test came on that cloned BMI-dev VM I mentioned earlier. I saved the bmiconfig file and dnsmasq config file and did my best to wipe everything else, then ran the playbook and tested to make sure I could run the normal BMI commands such as adding an image to the database, listing the database, provision/deprovisioning a node.

Modify bmi_config.cfg to match whatever your current HIL and Ceph setup is.

If you actually look at the bmi_config.cfg file, you can see it has instructions on what to put for each field, and there is a bmi_config.cfg.test file that has example settings.

Modify dnsmasq.conf within roles/dhcp/tasks/main.yml to match your requirements.

If you look at roles/dhcp/tasks/main.yml you can see there are pre-filled in setting for the dnsmasq.conf. You can keep the defaults, or you may want to change things like the interface you are using.

Comment out any of the roles you don't want run in site.yml

If you open the site.yml file you can see there are 3 roles listed. I tried to make each role self contained, so if you already had tgt setup, for example, you comment out "- tgt" and then run the playbook and you would only install the dhcp and bmi.

Let me know if this answers your questions.

pgrosu commented 6 years ago

Hi Dan (@djfinn14),

I am in the middle of a couple of hard research problems I working through and are taking most of my time, so I'll give a quick overview behind what I am asking. You have done a lot of great work here, but now there is one more bridge that needs crossing. So in my experience through different software projects, the easiest way I have found them to grow their user-base is by having a clearly guided transition to implementation from a minimal starting point. This means that you have to think like a new user, and thus educate and guide your perspective users from start to finish. That undoubtably takes time and work beyond a set of configurations and a Readme file. Imagine you are a new user who sees our MOC/IMS Github location, and wants to better understand why such an Ansible playbook implementation important, how to test it from a minimal starting point and how it will help them. I'm not saying explain everything, but if you pick a person who is new to our project or the MOC, and provide him/her with your set of instructions, would they be able to reproduce them without Googling or inquiring other resources? Do they understand the connection to the rest of the project? Would all users get the same result? This is a foundation of system validation. Since this not yet part of a smoke-test on a continuous-integration platform, this is even more pertinent.

Hope it makes sense and is helpful, Paul

naved001 commented 6 years ago

@pgrosu You could still review the ansible script nonetheless. Everything doesn't have to be blocked on just one thing.

pgrosu commented 6 years ago

@naved001 I understand what you are saying, but we want spend a bit more time at the beginning to save us simple, overlooked gaps as the project grows - otherwise this becomes more internal knowledge, which has a high-probability of shrinking the user-base over time. It is okay to have a human check as a secondary check, driven by a set of SOPs (Standard Operating Procedures) as a primary set of operational semantics when performing functional testing in order to guarantee repeatability. That is why we initiated that process through a first set of manuals/guides we created over the summer. Over time we want those to become automated as a large set of tests/scenarios for continuous integration that is more thorough than Travis, which would encompass things such as system validation.

@djfinn14 If you have time after today's meeting we can sit together for some of this.

naved001 commented 6 years ago

@pgrosu Could you make a list of things that you want @djfinn14 to do to get your approval on this PR? Please be as specific as you can and keep it simple. Once you pin down an exact set of requirements, we can work on it one at a time. Meanwhile, you could review at the ansible script itself (the main meat of this PR).

Just keep in mind that this script is aimed at people with sufficient/reasonable know-how of the linux world.