Revisited & Fixed Half (fp16) data support

This PR is an update and extension of half data support in portBLAS and includes following changes :

half support is enabled using the cmake option BLAS_ENABLE_HALF and is only applied to operators meant to support half according to oneMKL spec (so far in this PR axpy, scal and gemm)
unittests & benchmarksare extended to support mixed-precision comparison (reference BLAS libs only support float/double).
Extended unittests for axpy, scal, and gemm (+gemm_batched) using half.
Extended portblas, cublas & rocblas benchmarks for gemm (+gemm_batched).
Separated gemm configurations when using half data type for each TUNING_TARGET from the float/double configurations.

Other notes :

half precision support is disabled when targetting DEFAULT_CPU due to lack of fp16 support.
some legacy gemm configurations for intel GPU targets with sycl::half have been removed (not based on a tuning but rather a temporary reduction of generated kernels)

codeplaysoftware / portBLAS