FFTW plan generation : C versus Python context

robin-cls commented 2 years ago

Hi everyone,

I need general information about the FFTW planner. This is kind of a theorical question so I will try to put as much context as I can.

Context

I have a legacy C code that uses FFTW 3.3.4 to compute 1D DFT. This code does the following steps :

Generation of the input and output arrays using _fftwmalloc : they are filled with 0
Generation of the direct and inverse transforms plans using _fftw_plan_dft_r2c1d and _fftw_plan_dft_c2r1d respectively
Filling of the input array
Plan execution

This has been working well for a number of years. However, we recently decided to offer Python bindings for calling this code using pybind. After seeing very subtle differences between the Python and C results, we dived into the code and observed something unexpected with the plan generated from the 0-arrays :

Launching the code using the C program and printing the plan using _fftw_sprintplan shows nerdy but expected plan

zp_fftw_directplan: (rdft2-ct-dit/16
(hc2c-direct-16/60/1 "hc2cfdftv_16_avx"
(rdft2-r2hc-direct-16 "r2cf_16")
(rdft2-r2hc01-direct-16 "r2cfII_16"))
(dft-vrank>=1-x8/1
(dft-ct-dit/32
  (dftw-direct-32/124 "t1fv_32_avx")
  (dft-direct-32-x32 "n1fv_32_avx"))))
zp_fftw_inverseplan: (rdft2-ct-dif/2
(hc2c-direct-2/4/1 "hc2cbdftv_2_avx"
(rdft2-hc2r-direct-2 "r2cb_2")
(rdft2-hc2r10-direct-2 "r2cbIII_2"))
(dft-ct-dit/8
(dftw-direct-8/28 "t1bv_8_avx")
(dft-vrank>=1-x8/1
  (dft-ct-dit/16
    (dftw-direct-16/60 "t1bv_16_avx")
    (dft-direct-64-x16 "n1bv_64_avx")))))

However, launching the program with the same inputs with Python prints zp_fftw_directplan: (null). This is unexpected because the plan is executed properly afterwards, so it is not NULL. We also have a check that stops the program if the plan is NULL.

This could be a printing error because the documentation states that the print function is for entertainment only. However, following this weird behavior, we have done the following :

Launch the C program and write wisdom on disk
Recompile the C program to generate the plan using WISDOM_ONLY
Regenerate the Python bindings so that the plans are generated with WISDOM_ONLY

The results : the C program give a slightly different result using WISDOM_ONLY but works. However, the Python bindings does not seems to load the wisdom propely (return code of _fftw_import_wisdom_fromfilename is 0). What is more strange is that even though the wisdom is not properly loaded, the plan generation does not return NULL.

Question

At this point, I suspect that launching the program from the Python bindings sets up an environment that conflicts with FFTW3 plan generation. As I would like to seek advice from more advanced pybind users, so I need a little more info on the FFTW3 planner to give them an hint for debugging :

Apart from the arrays alignment, what does the planner uses (number of avalaible CPUs, process memory, ...) for generating the plans ?

stevengj commented 2 years ago

zp_fftw_directplan: (null) makes it sound like you are having some difficulty with pybind, and are somehow passing corrupted data between Python and C. I haven't used pybind myself so I can't give you more help there.

The plans are generated by runtime measurements, so in general they could be different each time you run the code. They don't explicitly use CPU metrics (except for the existence of SIMD features).

robin-cls commented 2 years ago

Thanks, I already checked the data passed to pybind and it seems OK. However, after discussing with someone familiar with pybind, they hinted me towards the compilation options used for the C program versus the Python-binded program. I use cmake and the dependencies are recompiled so I think some optimization flags are missings from the Python-binded program. I am looking into this and will keep you updated.

robin-cls commented 2 years ago

I am back with a minimal example that I think illustrates my problem in a practical way.

In the Git repository, we have a function foo() that uses an hard-coded vector of size 300 and computes its DFT. I followed what is in the tutorial but with real-to-complex DFT instead. This function is duplicated in two sources files : main.c that is built into a Main program, and bindings.cpp that is built into a Python module. Both outputs are built using cmake in a conda environment.

To compare the results, the foo() function prints the FFT plan and the ouput complex array.

When launching the foo function from both module and program, I get differences in the output vector (left is the C program call, right is the Python module call) :

Because the input vector for the FFT is hard-coded, I don't think there is a data corruption from pybind. Any idea why we see this behavior ?

Note : the hard-coded vector has been generated with random numbers with numpy, then scaled to 100. Note 2: taking a small vector (10 elements) shows no differences.

pybind version: 2.9.2 fftw3 version: 3.3.4

FFTW / fftw3

FFTW plan generation : C versus Python context #289