Closed migueldiascosta closed 4 years ago
on the bright side, I do see an average ~15% performance boost using the gearshifft
benchmark, I hope you're planning to contribute these back to FFTW
Hi Miguel Dias Costa,
Thanks for the feedback. The --enable-amd-opt
option currently works only for float and double and has not been tested for other datatypes. We missed mentioning this in the documentation. Apologies for the same. We will fix this soon.
Regarding the gearshift
benchmark, it nice to know that you are seeing improved performance. It would be great if you can provide more details on the type of FFTs and sizes where you are observing the improvement. We will also try to run the benchmark from our end and analyze the results.
@pradeeptrgit thanks for the clarification on the long double and quad precisions
Regarding the benchmarks, I used the default gearshifft
tests (I'll submit a PR with the AMD FFTW
variant and gearshifft
using it to the easybuild easyconfigs repository soon).
Those were single threaded tests, using 2 threads the average speedup is a bit higher, ~20%, and then goes down to ~10% for 4 threads, but for threaded runs there are other factors (e.g., processor affinity, numa effects) and I'm not as interested in those (mostly use one thread per MPI process of whatever application)
These were also without --enable-amd-trans
, the tests fail for me if I use it, I'll open a separate issue on that
Hi Miguel Dias Costa, Thank you for providing details on configuration of the benchmark tests performed by you. Regarding the flag “--enable-amd-trans”, I will provide more details in my reply under the separate issue reported by you.
However, let me put across few important details of this AMD optimized FFTW library (amd-fftw).
on the bright side, I do see an average ~15% performance boost using the
gearshifft
benchmark, I hope you're planning to contribute these back toFFTW
Hi Miguel Dias Costa, Can you tell us which test dataset you have run? We would like to repeat and reproduce the test results on our machines. Since there are multiple "extents" files in gearshifft, we want to know which one is widely used and referred? Secondly, gearshifft seems to log its output at the completion of all the test cases. Is there any configuration in gearshifft to enable output after completion of each individual's test. (This allows us to stop the gearshifft tests in between when it takes long time without loosing the results so far)
@BiplabRaut my initial tests were with gearshifft
's "default fallback" extents, which are indeed to small. The speedups are more modest as the size increases (and there is a large variation...)
Regarding the long runs, what I did was loop (e.g. in bash) over the extents (e.g., in one of the provided files) in order to have a separate run and output per extent (using the -e
and -o
arguments)
I'm afraid I don't have any particular insight on gearshifft
or how it is widely used - you may want to refer to https://arxiv.org/abs/1702.00629, if you haven't already
Hi Miguel Dias Costa, AMD-FFTW 2.1 is released with more optimizations. Can you please repeat your gearshifft tests with this new release and let us know your results.
Thank you.
@BiplabRaut I'll likely only look at this more carefully when we get our Rome nodes, but in a quick run on Naples using gearshifft's extents_small.conf I got an average speedup of 20% when using AMD-FFTW 2.1 (sequential) compared to FFTW 3.3.8 - again, I hope this optimizations make it upstream
Hi Miguel Dias Costa, Thank you for checking our 2.1 release and sharing the new Naples results. We look forward to your Rome results - hope you would be soon having your Rome nodes.
Thank you.
they pass without
--enable-amd-opt
for all precisions, and they pass with--enable-amd-opt
forfloat
anddouble
, but they fail with--enable-amd-opt
forlong double
andquad
precision, are these not supported?(This is on a server with AMD EPYC 7601 CPUs and using Easybuild's FFTW easyblock, https://github.com/easybuilders/easybuild-easyblocks/blob/master/easybuild/easyblocks/f/fftw.py, on the sources from https://github.com/amd/amd-fftw/archive/2.0.tar.gz)