amd / amd-fftw

FFTW code optimized for AMD based processors
GNU General Public License v2.0
48 stars 12 forks source link

hybrid OpenMP+MPI tests fail with --enable-amd-trans #2

Closed migueldiascosta closed 4 years ago

migueldiascosta commented 5 years ago
perl -w ./check.pl  -r -c=30 -v `pwd`/bench
         FFTW transforms passed basic tests!
perl -w ./check.pl  -r -c=30 -v --nthreads=2 `pwd`/bench
         FFTW threaded transforms passed basic tests!
perl -w ../tests/check.pl --verbose --random --maxsize=10000 -c=10  --mpi "mpirun -np 1 `pwd`/mpi-bench"
     MPI FFTW transforms passed 10 tests, 1 CPU
perl -w ../tests/check.pl --verbose --random --maxsize=10000 -c=10  --mpi "mpirun -np 2 `pwd`/mpi-bench"
      MPI FFTW transforms passed 10 tests, 2 CPUs
perl -w ../tests/check.pl --verbose --random --maxsize=10000 -c=10  --mpi "mpirun -np 3 `pwd`/mpi-bench"
      MPI FFTW transforms passed 10 tests, 3 CPUs
perl -w ../tests/check.pl --verbose --random --maxsize=10000 -c=10  --mpi "mpirun -np 4 `pwd`/mpi-bench"
      MPI FFTW transforms passed 10 tests, 4 CPUs
perl -w ../tests/check.pl --verbose --random --maxsize=10000 -c=10  --mpi --nthreads=2 "mpirun -np 3 `pwd`/mpi-bench"
Found relative error 1.000000e+00 (impulse 1)

this is for float datatype using both --enable-amd-opt and --enable-amd-trans (no problem without --enable-amd-trans)

migueldiascosta commented 5 years ago

I understand --enable-amd-trans is meant for single thread only, but then it should be automatically disabled in the threaded ones, no? And the pure OpenMP tests do pass, what fails is the hybrid OpenMP+MPI one...

migueldiascosta commented 5 years ago

Ok, if I completely disable both OpenMP and MPI the tests pass - it would be better if --enable-amd-trans was simply ignored in those cases, no?

Otherwise, at least in the EasyBuild context I suppose we can tweak the fftw easyblock to only allow --enable-amd-trans for the sequential cases, or build a completely separate sequential module

(Performance difference is marginal for the default gearshifft tests, I suppose they are simply too small)

migueldiascosta commented 5 years ago

added sequential variant to https://github.com/easybuilders/easybuild-easyconfigs/pull/8783

BiplabRaut commented 5 years ago

Hi Miguel Dias Costa, Thank you for reporting this issue.

The option "--enable-amd-trans" uses a new transpose method and adds a minor performance improvement. But this optional feature has a known issue in case of threaded/Hybrid execution modes. We suggest not to enable this option when you are running in cases of threaded/Hybrid tests.

For now, for threaded/Hybrid benchmark tests, please use amd-fftw with "--enable-amd-opt" option only (which is the main switch responsible for major performance improvements). Additionally, "--enable-amd-opt" is supported only for float and double libraries as of now.

I understand that "--enable-amd-trans" option should have been disabled for multi-threaded case automatically. Sorry for the inconvenience. We will plan to take care of this in the next release.