FFTW / fftw3

DO NOT CHECK OUT THESE FILES FROM GITHUB UNLESS YOU KNOW WHAT YOU ARE DOING. (See below.)
GNU General Public License v2.0
2.67k stars 652 forks source link

Mutithreading R2C problem #263

Open pidanself opened 2 years ago

pidanself commented 2 years ago

8 threads R2C with simd is slower than 4 threads R2C with simd.

Eviroment:
  CPU: intel i9 with 8 cores
  OS: macos
  compiler: macos gcc
  version: fftw-3.3.10
  ISA: AVX2
  threads: --enable-threads
  precision: single
  test way: ./bench -r 100 -v2 -owisdom -onthreads=4/8 60000/r60000

Below is my test data: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

size | 4 threads | 8 threads | speedup -- | -- | -- | -- r2c:60000 | 29000 mflops| 25000 mflops| 0.86206897 c2c:60000 | 31000 mflops| 41000 mflops| 1.32258065

C2C with 8 threads is faster than 4 threads. R2C with 8 threads is slower than 4 threads.

From above data, we can find R2C computations may have some problems.

I will test more datas to display this problem.

I guess fftw compute r2c through thransform r2c problems to c2c problems.If C2C is normal, R2C should be normal.

Are the test results normal? Or somthing wrong?