Switch to STREAM with heap allocation

The STREAM benchmark used with LLAMA is the base version from https://www.cs.virginia.edu/stream/FTP/Code/. However, there is also a version using dynamically allocated memory using posix_memalign announced in the news with implementation here. The latter does not suffer the problem that linking fails if the array size is set too high (hits limits of ELF), since the arrays used for the benchmarks are not statically allocated. We should switch to this version instead so we can run with bigger problem sizes, especialy on cluster CPUs.

alpaka-group / llama

Switch to STREAM with heap allocation #715