Add support for parallelization

In large projects the happo run may take too long. Although we have some performance improvements in the pipeline (#62), there is a limit to how fast we can make things. We need to provide a way for examples to be split up among multiple machines and have the results aggregated at the end.

There is a good amount of overlap here with #73, so this might make sense to do at the same time.

To enable this, I think we need to do 2 things:

Add options to happo run to only run on a subset of examples and return/output metadata about the run.
Add a mechanism to happo to aggregate partial results into a single result.

In an interest to keep the API simple, it seems like the arguments we want are the number of split points (i.e. the number of machines to parallelize across) and the split point to run on. For instance, if you have 4 machines, you would end up calling happo run 4 times, with arguments like happo run 1/4, happo run 2/4, happo run 3/4, and happo run 4/4. Of course, the arguments could use more explicit flags as well, something like: happo run --split=1 --of=2 (naming needs to be improved). This will work if the order of examples will always be deterministic.

cc @lelandrichardson

Galooshi / happo

Add support for parallelization #171