Closed chi86 closed 3 years ago
Yes, an FFTW_IDENTITY
kind could be convenient here!
In the meantime, you can already achieve what you want by using the advanced or guru interfaces.
Thank you for the hint! I implemented it now with "fftw_plan_many_r2r", yet I noticed that the case with a transformation in both directions is faster.
When I implement the plans it as following:
fftw_f = fftw_plan_r2r_2d(imax,kmax, *in, *out, FFTW_R2HC,FFTW_R2HC, FFTW_MEASURE);
fftw_b = fftw_plan_r2r_2d(imax,kmax, *out, *in, FFTW_HC2R,FFTW_HC2R, FFTW_MEASURE);
it takes "real 1m43.079"
Whereas if I use the "plan_many", reading
kind[0] = FFTW_R2HC; fftw_f = fftw_plan_many_r2r(1,&kmax,imax, *in,&imax, 1,kmax, *out,&imax, 1,kmax, kind, FFTW_MEASURE);
kind[0] = FFTW_HC2R; fftw_b = fftw_plan_many_r2r(1,&kmax,imax, *out,&imax, 1,kmax, *in,&imax, 1,kmax, kind, FFTW_MEASURE);
it takes "real 2m10.332s".
I prescribed the same imax&kmax for both cases.
I would have expected to see a runtime average by just computing the transformation in one direction, however it took longer.
Is there a faster way?
The final goal of this tests is to benchmark fftw with VFFTPK.
I just briefly looked through the code and saw, if I am not mistaken, that the "fftw_plan_r2r_2d" plan is implemented as a special form of the "plan_many_r2r". So consequently by proper implementation (which I apparently did not achieve) the implementation with "fftw_plan_many_r2r", essentially just a transformation in one direction has to be faster.
I think you want &kmax
and not &imax
for the inembed
and onembed
parameters.
Are you timing only the execution of the FFTs (fftw_execute
), or are you also timing the plan creation? Only the execution time is meaningful to compare.
Thanks I fixed that! The result is the same and either, with imax
and kmax
gives me the analytical solution (I solve a simple poisson equation)
Indeed the timing includes both, yet in order to minimize the impact of the plan creation I create it only once and than I run the execution in a loop 1E6 times.
Can you time them separately? Just add calls to gettimeofday
or similar in your code.
Ok I did that with the following outcome:
fftw_plan_many_r2r
Dt plan create 1038 mus
Dt plan execute 1381166 mus
real 0m1.383s
user 0m1.381s
sys 0m0.001s
Plan r2r_2d
fftw_plan_r2r_2d
Dt plan create 3593 mus
Dt plan execute 468257 mus
real 0m0.472s
user 0m0.472s
sys 0m0.000s
What size transform are you benchmarking?
One data set, which will be transformed will be at least 1024 element long, and around 5E5 of these is stored in an Array. For my purpose I will re-do this for up to 1E6 times.
I made an example of the code and put it in a repo, to clarify what I want to do. Thank you very much for your help!
I tried your situation with the includedtests/bench
benchmarking program, doing 65536 transforms of size 1024 (R2HC) (specified as k1024f*65536
), vs a 65536x1024 2d R2HC transform (specfied as k65536fx1024f
), running single-threaded in double precision on my 2.7GHz Intel Core i7 laptop:
Problem: k65536fx1024f, setup: 43.50 s, time: 5.72 s, ``mflops'': 762.62086
Problem: k1024f*65536, setup: 70.00 us, time: 274.65 ms, ``mflops'': 6108.6471
As expected, performing the transforms along just the rows was much faster (20x faster: time
is the time for 1 transform, collected over many trials).
Can you try running the FFTW bench
program?
Here is the output for the tests/bench
benchmarking on a Desktop i5 4.1GHz:
Problem: k65536fx1024f, setup: 23.94 s, time: 3.34 s, ``mflops'': 1306
Problem: k1024f*65536, setup: 36.03 ms, time: 199.40 ms, ``mflops'': 8413.8
As you expected, also for me the transform along the rows is much faster (for me a factor of 16.75).
Yet this mean, I messed the implementation up. Thanks for your support!
I fixed now some issues with the allocation of the array and I am now at a point where the fftw_plan_many_r2r
is twice as fast as the fftw_plan_r2r_2d
.
I looked briefly through the bench.c code and noticed something, which I can't understand. For the r2r api_simple
the two dimestions are sz->dims[0].n, sz->dims[1].n
, whereas for the api_many
the n from static int *mkn(bench_tensor *t)
is basically t->dims[0].n
and the "howmany" is vecsz->dims[0].n
, shouldn't it be vecsz->dims[1].n
for the "howmany", otherwise it is just done 1024 times instead of 65536, or am I wrong?
There is no vecsz->dims[1].n
— the vecsz->dims
array is a 1-element array if you have a single howmany
loop, independent of the length of the t->dims
array (whose length is the dimensionality of the transform).
In particular, t
and vecsz
correspond to dims
and howmany_dims
, respectively, in the guru interface.
Ah thank you for the clarification!
Hi, I would like to perform a fft in one direction for a 2D dataset. Now I use
fftw_f = fftw_plan_r2r_2d(imax,kmax, *in, *out, FFTW_R2HC,FFTW_R2HC, FFTW_MEASURE);
fftw_b = fftw_plan_r2r_2d(imax,kmax, *out, *in, FFTW_HC2R,FFTW_HC2R, FFTW_MEASURE);
for my forward and backwards transformation, where in & out are 2D arrays!Yet this performs a FFTW_R2HC & FFTW_HC2R for both directions. It would be convenient to set the second r2r kind, so that there is no transformation in this direction. According to: 4.3.6 Real-to-Real Transform Kinds there is currently no kind with this functionality. Is there another way to implement something like this?