Modernize Travis settings for CI performance

MaddieM4 commented 9 years ago

I'm grouping together two config changes that, together, should have an immensely positive effect on CI response time. Not that we're currently hurting, the last run took about a minute for the parallel builds (5:43 total, in the same sense that man-hours and CPU time add up in parallel). But we can always go faster, and there's some low-hanging fruit to be had.

The first thing is the more obvious, easy, and high-impact one: use Travis' new container infrastructure. This comes with a lot of performance benefits, the limitations do not effect how we do things (we're not using sudo, etc.), and it's as easy as adding sudo: false to the config.

The second thing is enabling caching of our dependencies. This allows us to reduce the install time for such things as Template Toolkit, Spiffy, etc. Anything installed during the Travis build process will be preserved for later runs. When the cpanm command runs, it will find that all the packages are already installed, and exit quicker, doing less work. This is less of a win, because we don't spend that much time building our deps thanks to --notest, but it does turn a bunch of serial requests to CPAN into a single fast tarball retrieval from S3 (under the hood, see docs link).

I'm definitely going to do the containerization change, and will try the caching to see if it has any benefit. For that kinda thing, optimization can be counterintuitive - under the right (wrong?) conditions, caching may actually be slower! So empirical evidence is king.

MaddieM4 commented 9 years ago

Actually, it may well turn out that caching is the bigger win than containerization, although containerization does seem to be just the tiniest bit faster (well within the range of variance so far). According to the build log, we spend around 15s in cpanm. For shame! That's exactly what caching should cut down, in theory.

FWIW, it's possible that the most important speedup of containers - revving up virtual environments before starting them - is something that the Travis metrics do not count in the times. With all the other things that happen between push and build-start, it's hard to say how much time we're shaving off before Travis starts the clock, without potentially blaming either approach for some temporary hiccup in (say) the web hook from Github to Travis.

MaddieM4 commented 9 years ago

Caching backfired spectacularly, as the supposedly fast S3 store/retrieve is significantly slower than just fetching from CPAN and building. Sure, it brought our CPAN overhead down from 15s to .25s. But it also incurred about 2 minutes worth of its own cost. This would be a godsend if we had a truly slow dependency install time, but it would have to be over 2 minutes to cross that threshold of usefulness.

We'll just take our minor speedups from the container change, and call it good.

MaddieM4 commented 9 years ago

Merged 9289cf1

MaddieM4 commented 9 years ago

In retrospect, it looks like the biggest bottleneck is the number of parallel jobs Travis will run per project. The behavior I'm seeing is that we get 5 jobs in parallel, but the 6th will not start until one of the first 5 complete. There's nothing special about the 6th, it builds as fast as any other once it's allowed to start - so this effectively doubles the wall clock time.

I'm loathe to remove jobs, since it seems that your parallelism is actually determined by the load on Travis CI as a whole, which is completely out of my hands :smile: Reducing the number of jobs improves the odds of being able to do everything in parallel, but never guarantees such an outcome. Then again, I think there may be some hardcoded cap, since 5 is such a round number, so that the probability of doing 6 jobs in parallel is always 0%, regardless of load on Travis CI. Will consult with @ingydotnet on this matter.

MaddieM4 / jemplate

Modernize Travis settings for CI performance #6