Kitware / gobig

Provisioning big data applications with Resonant
Apache License 2.0
5 stars 3 forks source link

[WIP] first pass at a test suite #38

Open opadron opened 8 years ago

opadron commented 8 years ago

Adds testing infrastructure to the gobig repository.

Performs testing against clusters of vagrant machines. Test case playbooks (site.yml) are statically analyzed, ran against the specified vagrant cluster, and ran again to test for idempotency. Finally, test cases can provide a unit test playbook (unit.yml) that would be used to test the software/configuration provisioned as part of the main test.

opadron commented 8 years ago

Here's what I've managed so far:

While working out how to make the testing work, I realized that neither Travis, nor any other free CI service offers full virtualization in their testing environments, which is what GoBig needs in order to avoid dependence on third-party resources. We could set things up so that travis would spin up AWS instances, for example, but that kind of setup opens a can of security worms. Since the roles in GoBig are meant for provisioning clusters, and it'd be preferable that they be properly tested without the need for our own CI service, virtualization is the only game left in town.

After exploring options ranging from the most limited to the hilariously impractical, I've decided in the end to simply not run test playbooks as part of the CI process. For now, only static analysis checks are run when the lack of full virtualization support is detected (FULL_VIRTUALIZATION=NO). This arrangement excludes the ansible provisioning tests, the ansible idempotency tests, the ansible unit tests, and the aggregate coverage tests (since the coverage numbers would be abysmal in this case).

Running the test suite on a non-virtualized host should run the entire suite, which we'll need to do manually on our own. If you do this, make sure that you're on a beefy system as the demand on its resources will be substantial.

At the moment, the style checks for the python code in filter_plugins are the only thing causing travis to fail. I hope to get around to cleaning those files up in the near future.

opadron commented 8 years ago

After trying to run the full test suite on ec2, I realized that not even ec2 provides full virtualization (I probably should have figured). So, going the virtualization route really does require a real, bona fide physical system with enough weight to handle the testing. Besides the fact that I have no such system of my own with which to run the full suite, I think we should try to make the suite fully accessible to Travis, ec2, and other environments as a matter of course (even if we require it be explicitly enabled in the case of no virtualization).

So, that leaves us with emulation as opposed to virtualization. The idea is to use vagrant with a provider based on libvirt to spin up KVM and/or qemu hosts and run the test playbooks on them. If it works, this setup should allow the test suite to be ran on Travis and elsewhere, albeit at what would likely be a monumental performance hit. I expect that the performance would be bad enough that we'd still need to disable most of the suite for Travis.

So, this is what I think, but I'd rather hear what others think before I go down that rabbit hole.

ping @Kitware/gobig

jeffbaumes commented 8 years ago

For a functioning collaborative repo, I think a general guideline is that Travis needs to complete in under 10 minutes, though maybe we push that for something as heavy as what we are trying to test here.

I do not know enough about the proposed approach to make an informed decision, but it appears that other approaches have been attempted and this is what we are left with, in which case this method seems fine. Does anyone else see an alternative?

opadron commented 8 years ago

To clarify in case it's not clear: I don't propose to dispose of what I have now that uses proper virtualization -- rather disable it by default and add another mechanism that uses emulation...that would also be disabled by default (so that feedback from Travis remains timely). Now, we'll still need to run the full suite from time to time, and I suspect that we'd like to be able to do so on a single ec2 instance; hence, the suggestion of emulation.

If we have dedicated hardware in-house with enough storage, memory, and CPU cycles; then we can skip emulation and just use what we have. Even weak hardware will do if we're willing to directly target ec2 instances instead of local VMs. In either case, it needs to be in-house to prevent sharing any credentials with third parties and we'd be on the hook for tying our own CI solution to github (e.g.: Jenkins or Buildbot).