Support -mpcu flag and autodetect to trigger vectorization on proper vector lengths

This PR passes proper llvm::TargetMachine information in llvm_jit and codegen_llvm by introducing a proper TargetMachine at the LLVMJit level and avoids introducing adhoc objects.

The TargetMachine is constructed either from the --mcpu flag if passed or from the cpuid information.

As a consequence of all this, one can now emit AVX2 and AVX512 code. Before this commit, the TargetMachine was essentially a default one and only AVX code would be generated.

To test and see it one can run with:

cd build && \
make -j 16 test_mapper_llvm && \
./test/test_mapper_llvm --logtostderr=1 --llvm_dump_asm=1 --llvm_dump_after_opt=1 --llvm_dump_before_opt=1 --gtest_filter="*Batch*" --mcpu=skylake

Of course if one forces a more fancy architecture than one has, illegal instructions will likely be generated but at least the asm will be printed properly.

facebookresearch / TensorComprehensions

Support -mpcu flag and autodetect to trigger vectorization on proper vector lengths #588