8 threads R2C with simd is slower than 4 threads R2C with simd.
Eviroment:
CPU: intel i9 with 8 cores
OS: macos
compiler: macos gcc
version: fftw-3.3.10
ISA: AVX2
threads: --enable-threads
precision: single
test way: ./bench -r 100 -v2 -owisdom -onthreads=4/8 60000/r60000
Below is my test data:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
8 threads R2C with simd is slower than 4 threads R2C with simd.
Below is my test data: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
size | 4 threads | 8 threads | speedup -- | -- | -- | -- r2c:60000 | 29000 mflops| 25000 mflops| 0.86206897 c2c:60000 | 31000 mflops| 41000 mflops| 1.32258065