FFTW / fftw3

DO NOT CHECK OUT THESE FILES FROM GITHUB UNLESS YOU KNOW WHAT YOU ARE DOING. (See below.)
GNU General Public License v2.0
2.67k stars 652 forks source link

FFTW plan generation : C versus Python context #289

Open robin-cls opened 2 years ago

robin-cls commented 2 years ago

Hi everyone,

I need general information about the FFTW planner. This is kind of a theorical question so I will try to put as much context as I can.

Context

I have a legacy C code that uses FFTW 3.3.4 to compute 1D DFT. This code does the following steps :

This has been working well for a number of years. However, we recently decided to offer Python bindings for calling this code using pybind. After seeing very subtle differences between the Python and C results, we dived into the code and observed something unexpected with the plan generated from the 0-arrays :

This could be a printing error because the documentation states that the print function is for entertainment only. However, following this weird behavior, we have done the following :

The results : the C program give a slightly different result using WISDOM_ONLY but works. However, the Python bindings does not seems to load the wisdom propely (return code of _fftw_import_wisdom_fromfilename is 0). What is more strange is that even though the wisdom is not properly loaded, the plan generation does not return NULL.

Question

At this point, I suspect that launching the program from the Python bindings sets up an environment that conflicts with FFTW3 plan generation. As I would like to seek advice from more advanced pybind users, so I need a little more info on the FFTW3 planner to give them an hint for debugging :

Apart from the arrays alignment, what does the planner uses (number of avalaible CPUs, process memory, ...) for generating the plans ?

stevengj commented 2 years ago

zp_fftw_directplan: (null) makes it sound like you are having some difficulty with pybind, and are somehow passing corrupted data between Python and C. I haven't used pybind myself so I can't give you more help there.

The plans are generated by runtime measurements, so in general they could be different each time you run the code. They don't explicitly use CPU metrics (except for the existence of SIMD features).

robin-cls commented 2 years ago

Thanks, I already checked the data passed to pybind and it seems OK. However, after discussing with someone familiar with pybind, they hinted me towards the compilation options used for the C program versus the Python-binded program. I use cmake and the dependencies are recompiled so I think some optimization flags are missings from the Python-binded program. I am looking into this and will keep you updated.

robin-cls commented 2 years ago

I am back with a minimal example that I think illustrates my problem in a practical way.

In the Git repository, we have a function foo() that uses an hard-coded vector of size 300 and computes its DFT. I followed what is in the tutorial but with real-to-complex DFT instead. This function is duplicated in two sources files : main.c that is built into a Main program, and bindings.cpp that is built into a Python module. Both outputs are built using cmake in a conda environment.

To compare the results, the foo() function prints the FFT plan and the ouput complex array.

When launching the foo function from both module and program, I get differences in the output vector (left is the C program call, right is the Python module call) : image

Because the input vector for the FFT is hard-coded, I don't think there is a data corruption from pybind. Any idea why we see this behavior ?

Note : the hard-coded vector has been generated with random numbers with numpy, then scaled to 100. Note 2: taking a small vector (10 elements) shows no differences.

pybind version: 2.9.2 fftw3 version: 3.3.4