FFTW / fftw3

DO NOT CHECK OUT THESE FILES FROM GITHUB UNLESS YOU KNOW WHAT YOU ARE DOING. (See below.)
GNU General Public License v2.0
2.71k stars 660 forks source link

fftw_plan_dft_c2r_2d chopping off residual imaginary parts #159

Closed francescoboc closed 5 years ago

francescoboc commented 5 years ago

I ran into an annoying problem that was coming from the fact that the fftw_plan_dft_c2r_2d"chops off" the residual imaginary parts from the rounding error of the IFT. To solve the problem I had to switch to fftw_plan_dft_2dand work with complex arrays. Everything is explained in detail in this Stackoverflow question and answer that I wrote. I think that this behaviour of fftw_plan_dft_c2r_2dshould be mentioned in the guide, as it can lead to problems when used recursively in a sum as I did.

stevengj commented 5 years ago

The c2r transform assumes that the data is equivalent to "half" of an array with conjugate-symmetry, which corresponds to the DFT of real inputs. Since the data is actually slightly over half of the "logical" conjugate-symmetric array, this means that it must contain some redundancy.

If you give the c2r transform "random" complex inputs that don't have the requisite symmetry/redundancy, then the the input gets projected into the requisite subspace and you lose some information.

Moral: only use c2r transforms if the inputs are the output of an r2c transform or equivalent.

francescoboc commented 5 years ago

@stevengj Yes, I understand, but I think that the problem is hidden in that "or equivalent". What does it mean exactly? Because in my case I was not giving at all "random" complex inputs to c2r, I was giving an array obtained from a r2c transform, on which I did some operations (of course I was doing operations, what would be the point of doing r2c and then straight away c2r to obtain the identity operation?).

Please, have a look at my Stackoverflow post, and you will see what I mean. The code is now written using fftw_plan_dft_2d, and not the c2r, r2c anymore, but you can easily imagine how it was before (if you can't, I can rewrite the code and send it to you). And it was not working, the only way to make it work has been to consider the dh array as complex, a thing that I couldn't have done using c2r. It took me almost 2 months to figure it out.

I also found another SO question of 7 years ago which describes EXACTLY the same problem.

Moral: using c2r and r2c transforms to implement pseudospectral methods to integrate PDEs is NOT a good idea. But I insist: this should be mentioned somewhere in the guide.

stevengj commented 5 years ago

You have to do operations that preserve the conjugate symmetry, i.e. that produce an array that could have been the result of an r2c transform. For example, multiplying by a real scalar, or elementwise multiplying the results of two r2c transforms for a convolution, or summing the results of two r2c transforms.

If you are using fftw_plan_dft_2d and get real output, then you could have used c2r. If you are not getting real output (up to roundoff errors), then taking the real parts of the output is throwing away data.

It is perfectly fine to use r2c and c2r transforms to implement pseudospectral methods for PDEs involving real variables. (In fact, one such code is exactly what they were initially written for in FFTW.)

stevengj commented 5 years ago

In a quick glance at the stackoverflow code you posted, it looks like the problem may be that you are taking the second derivative (multiplying by the wavevector "q^2" in Fourier space) in a way that does not preserve conjugate symmetry, and hence will not produce real outputs. In particular, when doing spectral differentiation, you often need careful handling of the "middle" elements, or the Nyquist frequency in 1d, as I review in these notes.

Update: at second glance, your q^2 is probably okay, but it's worth taking a closer look.

stevengj commented 5 years ago

(In particular, your SO post says "Basically, because of stupid rounding errors, the IFT of the FT of a real function (in my case dh), is NOT purely real." Are you sure these are floating-point "rounding" errors, and not incorrect handling of the "Nyquist" elements? Try looking at these errors for a very coarse spatial grid — if they are rounding errors, they should still be on the order of 1e-13 times the norm of the array, but if you are mishandling of the Nyquist elements or similar then the erroneous imaginary parts should become much larger as you coarsen your discretization.)

francescoboc commented 5 years ago

@stevengj Thanks for your answer, but I don't think q^2 is the problem. I tried to remove the multiplication by q^2 and the code is still unstable (if I discard the imaginary part of dh).

Moreover, have a look at this code:

# include <stdio.h>
# include <stdlib.h>    // contains rand()
# include <math.h>      // contains M_PI and pow()
# include <complex.h>   // for complex notation of arrays
# include <omp.h>       // OpenMP for parallelization
# include <fftw3.h>     // FFTW3 for Fast Fourier Transforms
# define pi M_PI

int main(){

// define grid size and spacing
int Nx = 200;       // n of points on x
int Ny = 200;       // n of points on y
double dx = 0.5;    // bin size on x and y

// define simulation time and time step
long int Nt = 1 *pow(10,4);     // no. of time steps
double dt = 0.5;                // time step size

// total number of frames to plot (at denominator)
int nframes = Nt/100;

// define the noise for the initial condition
double rn, drift = 0.05;   // punctual drift of h(x)
srand(666);                // seed the RNG

int nthreads = 10;   // no. of threads used in the parallelization

// other variables
int i, j, nt;    // variables for space and time loops

// set the parameters for FFTW3 and OpenMP parallelization
fftw_init_threads();                            // initialize the threads for FFTW3
omp_set_num_threads(nthreads);                  // set the no. of threads for OpenMP
fftw_plan_with_nthreads(omp_get_max_threads()); // passing the no. of OMP threads to FFTW3

// declare and allocate memory for real variables
double *Linft = fftw_alloc_real(Nx*Ny);
double *Q2 = fftw_alloc_real(Nx*Ny);
double *qx = fftw_alloc_real(Nx);
double *qy = fftw_alloc_real(Ny);

// declare and allocate memory for complex  variables
fftw_complex *dh = fftw_alloc_complex(Nx*Ny);
fftw_complex *dhft = fftw_alloc_complex(Nx*Ny);
fftw_complex *Nonl = fftw_alloc_complex(Nx*Ny);
fftw_complex *Nonlft = fftw_alloc_complex(Nx*Ny);

// create the FFTW plans
fftw_plan FT_dh_dhft = fftw_plan_dft_2d ( Nx, Ny, dh, dhft, FFTW_FORWARD, FFTW_ESTIMATE );
fftw_plan FT_Nonl_Nonlft = fftw_plan_dft_2d ( Nx, Ny, Nonl, Nonlft, FFTW_FORWARD, FFTW_ESTIMATE );
fftw_plan IFT_dhft_dh = fftw_plan_dft_2d ( Nx, Ny, dhft, dh, FFTW_BACKWARD, FFTW_ESTIMATE );

// open file to store the data
char acstr[160];
FILE *fp;
sprintf(acstr, "CH2d_IE_dt%.2f_dx%.3f_Nt%ld_Nx%d_Ny%d_#f%d.dat",dt,dx,Nt,Nx,Ny,nframes);

// generate h(x,y) at initial time (random noise)
for ( i = 0; i < Nx*Ny; i++ ) {
    rn = (double) rand()/RAND_MAX;  // extract a random number between 0 and 1
    dh[i] = drift-2.0*drift*rn;     // shift of +-drift
}

// execute plan for the first time
fftw_execute (FT_dh_dhft);

// calculate wavevectors with FFTW ordering and simmetry
for (i = 0; i < Nx; i++) { qx[i] = 2.0*i*pi/(Nx*dx); }
for (i = 0; i < Ny; i++) { qy[i] = 2.0*i*pi/(Ny*dx); }
for (i = 1; i < Nx/2; i++) { qx[Nx-i] = -qx[i]; }
for (i = 1; i < Ny/2; i++) { qy[Ny-i] = -qy[i]; }

// calculate q^2 and the FT of the linear operator
for ( i = 0; i < Nx; i++ ) {
    for ( j = 0; j < Ny; j++ ) {
        Q2[i*Ny+j] = qx[i]*qx[i] + qy[j]*qy[j];
        Linft[i*Ny+j] = -Q2[i*Ny+j]*Q2[i*Ny+j];
    }
}

// TIME LOOP
for(nt = 0; nt <= Nt; nt++) {

    if((nt % nframes)== 0) {
        printf("%.0f%%\r",((double)nt/(double)Nt)*100);
        fflush(stdout);

        // write data to file
        fp = fopen(acstr,"a");
        for ( i = 0; i < Nx; i++ ) {
            for ( j = 0; j < Ny; j++ ) {
                fprintf(fp, "%.6f ", dh[i*Ny+j]);
            }
            fprintf(fp, "\n");
        }
        fclose(fp);

        // check if h is NaN
        if (isnan(dh[1])!=0) {
            printf("crashed!\n");
            return 0;
        }
    }

    // calculate h^3 in direct space
    #pragma omp parallel for
    for ( i = 0; i < Nx*Ny; i++ ) {
        Nonl[i] = dh[i]*dh[i]*dh[i];
    }

    // Fourier transform of h^3 (Nonl only contains h^3 now)
    fftw_execute (FT_Nonl_Nonlft);

    // calculate Nonlinear term (dxxh^3 - dxxh) in Fourier space 
    // NB second derivative in Fourier space is just multiplication by -q^2
    #pragma omp parallel for
    for ( i = 0; i < Nx*Ny; i++ ) {
        Nonlft[i]= -Q2[i]*(Nonlft[i]-dhft[i]);
    }

    // Implicit Euler scheme in Fourier space
    #pragma omp parallel for
    for ( i = 0; i < Nx*Ny; i++ ) {
        dhft[i] = (dhft[i] + dt*Nonlft[i])/(1.0 - dt*Linft[i]);
    }

    // transform h back in direct space
    fftw_execute (IFT_dhft_dh);

    // normalize
    #pragma omp parallel for
    for ( i = 0; i < Nx*Ny; i++ ) {
        dh[i] = dh[i] / (double) (Nx*Ny);
    }

}

// termiate the FFTW3 plan and clean the memory
fftw_destroy_plan(FT_dh_dhft);
fftw_destroy_plan(FT_Nonl_Nonlft);
fftw_destroy_plan(IFT_dhft_dh);

fftw_cleanup();
fftw_cleanup_threads();

fftw_free(dh);
fftw_free(Nonl);
fftw_free(qx);
fftw_free(qy);
fftw_free(Q2);
fftw_free(Linft);
fftw_free(dhft);
fftw_free(Nonlft);

printf("end!\n");

return 0;
}

That's basically the same code that I posted on SO, but it's parallelized with OMP and uses complex.h. The dh array should be real, and indeed if we inspect the output file we see that it contains no imaginary part: -0.016216 0.041131 -0.004177 -0.016961 -0.003971 0.045668 0.008690 ... and I can also plot it directly using the imgshow function of matplotlib/python. Great!

But if I change the line in which I normalize dh from this: dh[i] = dh[i] / (double) (Nx*Ny); to this: dh[i] = creal(dh[i]) / (double) (Nx*Ny); than the code is unstable again! (And it happens the same if I change the transform functions to r2c and c2r).

Is it there something that I am missing?

stevengj commented 5 years ago

dh array should be real, and indeed if we inspect the output file we see that it contains no imaginary part: -0.016216 0.041131 -0.004177 -0.016961 -0.003971 0.045668 0.008690 ... and I can also plot it directly using the imgshow function of matplotlib/python. Great!

Clearly it is not purely real if creal(dh[i]) changes the answers compared to dh[i]; printing only 5 digits hides the magnitude of the imaginary parts. The question is, how big are the imaginary parts? Try computing (maximum abs(cimag)) / (maximum abs(creal)) of your array for a small value of Nx and Ny (say 6).

francescoboc commented 5 years ago

@stevengj Yea, you are right, sorry. I thought that fprintf worked differently for complex arrays (it's the first time that I use complex.h). I tried to do what you suggest, and for Nx = Ny = 6 and after Nt = 1000000 (10^6) steps, I get 0.0000000000000000000000000. For a slightly bigger array, however (Nx = Ny = 25, Nt unchanged), I get 0.0000000000024354170460225. For Nx = Ny = 100 the ratio is 0.0000000000005667251389814 (smaller). And for Nx = Ny = 200 I get 0.0000000000000660306547623 (even smaller).

stevengj commented 5 years ago

What is it after 1 timestep? The magnitudes you're reporting do sound consistent with floating-point roundoff errors.

However, it sounds like the problems you are seeing have nothing to do with r2c/c2r, since you say they happen even for complex data if you discard the imaginary parts. Hence we should probably close this issue.

If that's the case — your algorithm is incredibly sensitive to inevitable tiny floating-point roundoff errors — then you may have a numerically unstable algorithm that you'll want to re-think.

francescoboc commented 5 years ago

After 1 timestep I get the following ratios for different Nx,Ny: Nx = Ny = 6 -> 0.00000000000000000000 Nx = Ny = 25 -> 0.00248675364570205993 Nx = Ny = 100 -> 0.00000000000000025229 Nx = Ny = 200 -> 0.00000000000000028818

Well, yes, these problems happen when I discard the residual imaginary parts of the function that I am integrating, but what I am saying is that when I use r2c/c2r transforms, then it is impossible to fix the issue, because the c2r function does not keep these imaginary parts.

For what concerns the sensibility of my algorithm, I tried other, more sophisticated ones, like Exponential Time DIfferencing (ETD) and ETD Runge-Kutta 2nd order (ETDRK2) [https://doi.org/10.1006/jcph.2002.6995]. They were still crashing, a bit later in time since they are less "sensible", but still crashing.

I also tried to change the equation, since I thought that maybe the problem is the h^3 in the nonlinear term. I tried Swift-Hohemberg and Kuramoto-Sivashinsky, but the bug was still there.

I did not find a way to avoid the instability when using r2c/c2r and summing recursively dhft and a term obtained as FT(IFT(dhft)).

stevengj commented 5 years ago

If you are getting 0.00248675364570205993 then that is not a roundoff error and you have a bug (but maybe only for the odd-N case). You might also monitor the maximum imaginary part over time — it could be that at some point you have a much larger imaginary part even for the even-N case, which would help pinpoint your problem.

If you have a numerical instability in your approach, then just switching to another integrator will probably not fixed it. You have some fundamental problem in your formulation if it is sensitive to errors on the order of 2e-16, and you need to re-think what you are doing at a more basic level. If you have a bug where sometimes you get a large imaginary part due to some symmetry-breaking in your algorithm, then you also need to find and fix that.

Not using r2c/c2r is only "fixing" your problem at a very superficial level; there is no error in these transforms.

francescoboc commented 5 years ago

Yes, sorry it is my fault, It was a mistake to run the program for Nx = Ny = 25. As it is written now, my program can only handle even N's. Indeed, for even numbers, I get: Nx=Ny=24 -> 0.00000000000000014039 Nx=Ny=26 -> 0.00000000000000011522

The unstable behaviour that I described in the previous posts happens for even N's, where there are only roundoff errors in the imaginary part of dh.

I am not claiming that there is an error in the r2c/c2r transforms, but I think that the fact that these functions are discarding the residual imaginary part in dh, makes a pseudospectral approach, which basically involves a sum of dhft and FT(IFT(dhft)), unstable after a large number of iterations.

Sorry if I insist, but my code is just a straight-forward implementation of IE/ETD/ETDRK2, so I really don't see where I could have a fundamental problem in the approach. Besides, as I already mentioned, there is at least another user reporting an analogous problem when using r2c/c2r to implement a pseudospectral method.