@giordano suggested that OpenBLAS and ARMPL libraries do not fully exploit vectorization currently on aarch64 architecture. Instead it's better to use FujitsuBLAS on Fujitsu CPUs.
He has this package for automatically redirecting BLAS calls to FujitsuBLAS, giordano/FujitsuBLAS.jl.
Beware that the real FujitsuBLAS library has to be installed separately, which I don't know if it's installed in CTE-ARM.
I would like a benchmark of some big matrix multiplications comparing OpenBLAS and FujitsuBLAS.
@giordano suggested that
OpenBLAS
andARMPL
libraries do not fully exploit vectorization currently onaarch64
architecture. Instead it's better to useFujitsuBLAS
on Fujitsu CPUs.He has this package for automatically redirecting BLAS calls to FujitsuBLAS, giordano/FujitsuBLAS.jl.
Beware that the real FujitsuBLAS library has to be installed separately, which I don't know if it's installed in CTE-ARM.
I would like a benchmark of some big matrix multiplications comparing OpenBLAS and FujitsuBLAS.
This issue can be related to #95 #100.