Attempt to provide SIMD performance estimates

Habakkuk currently makes no attempt to deduce what performance might be obtained by SIMD-vectorising the loops that it finds. The only way to account for this at the moment is to assume perfect SIMD and multiply the performance estimate it produces by the vector length (e.g. 2 for SSE, 4 for AVX2).

Since Habakkuk already has support for loop-unrolling we could, in principle, unroll the loop by the vector length and look to pack contiguous array accesses into 'vector' variables/nodes. We will investigate the feasability of doing this in this ticket.

arporter / habakkuk

Attempt to provide SIMD performance estimates #4