FFTW / fftw3

DO NOT CHECK OUT THESE FILES FROM GITHUB UNLESS YOU KNOW WHAT YOU ARE DOING. (See below.)
GNU General Public License v2.0
2.67k stars 652 forks source link

Memory leak with fftwf? #281

Closed mwk088 closed 2 years ago

mwk088 commented 2 years ago

Hello, I am seeing a memory leak with my code, and valgrind is pointing to the FFTW function, however, I am unsure if this is true. I have pasted the code in which the plans are created, executed and destroyed to view and see if I am handling FFTW incorrectly. Also, this is a multithreaded application I am running.

Any help would be greatly appreciated

Thank you Mark

boost::mutex Framework_Fft::fft_scope_mutex;

void Framework_Fft::initialize()
{
   plan_in = (fftwf_complex *)fftwf_malloc(sizeof(fftwf_complex) * 512);
   plan_out = (fftwf_complex *)fftwf_malloc(sizeof(fftwf_complex) * 512);

   for (auto const &plan : mFftwfPlans)
   {
      mFftwfPlans[plan.first] = fftwf_plan_dft_1d(plan.first,
                                                  plan_in,
                                                  plan_out,
                                                  FFTW_FORWARD,
                                                  FFTW_PATIENT);
   }
   is_initialized = true;
}

void Framework_Fft::FftComplex(
    size_t fft_size,
    const framework_complex_internal *input_buffer,
    framework_complex_internal *output_buffer)
{
   // Validate FFT fft_size
   if (fft_size != 64 && fft_size != 128 && fft_size != 256 && fft_size != 512)
   {
      throw std::invalid_argument("Bad FFT fft_size");
   }
   boost::lock_guard<boost::mutex> lock(fft_scope_mutex);

   if (!is_initialized)
   {
   initialize();
   }

   bool align_needed = (uint64_t)input_buffer % 16 != 0;
   fftwf_complex *fft_in = (fftwf_complex *)input_buffer;
   fftwf_complex *fft_out = (fftwf_complex *)fftwf_malloc(sizeof(fftwf_complex) * fft_size);

   // We may need to align the buffer that is input to fftwf. This makes FFT faster.
   if (align_needed)
   {
      fft_in = (fftwf_complex *)fftwf_malloc(sizeof(fftwf_complex) * fft_size);
      memcpy(
          fft_in,
          input_buffer,
          sizeof(fftwf_complex) * fft_size);
   }

   fftwf_execute_dft(mFftwfPlans[fft_size], fft_in, fft_out);

   size_t hafl_size = fft_size / 2;
   // TODO: Why do we need to do this half copy? Is there an easier way?
   // First half from fftwf_execute goes into 2nd half of output buffer
   memcpy(
       output_buffer + hafl_size,
       fft_out,
       hafl_size * sizeof(fftwf_complex));

   // Second half of fftwf_execute goes into first half of output buffer.
   memcpy(
       output_buffer,
       fft_out + hafl_size,
       hafl_size * sizeof(fftwf_complex));
   if (align_needed)
   {
      fftwf_free(fft_in);
   }
   fftwf_free(fft_out);

   // if (is_initialized)
   // {
   // for (auto const &plan : mFftwfPlans)
   // {
   //    fftwf_destroy_plan(mFftwfPlans[plan.first]);
   // }
   // fftwf_forget_wisdom();
   fftwf_cleanup();

}
stevengj commented 2 years ago

If you want to free all memory allocated by FFTW, you should call fftwf_cleanup() at the end of your program, as documented in the manual. You also need to first call fftwf_destroy_plan, which is currently commented out in your code above?

mwk088 commented 2 years ago

Our program runs indefinitely. That we are seeing a continual memory growth over time and that valgrind is pointing us to fftw, hence our attempt to use cleanup to some degree of regularity.

If the fft plan continuously adds wisdom over time, how do we “cleanup” with something that runs forever?

stevengj commented 2 years ago
  1. You need to call fftwf_destroy_plan after you are done with a plan, otherwise you have a memory leak. (Presumably you should do this in the destructor for Framework_Fft.)
  2. Although FFTW allocates some memory (e.g. "wisdom") internally when you create a plan, this memory is re-used when re-creating a plan of the same size. So if you are creating plans over and over of the same sizes (and destroying them) then there is no growth in total memory usage.
mwk088 commented 2 years ago

Thanks for the feedback. Currently, we create a plan once inside 'initialize' upon startup and use that plan over and over with FftComplex being executed on streaming data constantly. I have updated what the code now looks like above. And below is what Valgrind tells me about the fftwf leak. Is this an incorrect reporting by Valgrind?

==20853== 19,080 (96 direct, 18,984 indirect) bytes in 4 blocks are definitely lost in loss record 14,519 of 14,694
==20853==    at 0x4C3C486: memalign (vg_replace_malloc.c:1517)
==20853==    by 0x14C83BE4: fftwf_malloc_plain (in /usr/local/lib/libfftw3f.so.3.3.2)
==20853==    by 0x14D60E21: fftwf_mkapiplan (in /usr/local/lib/libfftw3f.so.3.3.2)
==20853==    by 0x14D67B25: fftwf_plan_many_dft (in /usr/local/lib/libfftw3f.so.3.3.2)
==20853==    by 0x14D66F76: fftwf_plan_dft (in /usr/local/lib/libfftw3f.so.3.3.2)
==20853==    by 0x14D66D85: fftwf_plan_dft_1d (in /usr/local/lib/libfftw3f.so.3.3.2)
==20853==    by 0x566E72A7: Framework_Fft::initialize() (Framework_Fft.cc:17)
==20853==    by 0x566E7380: Framework_Fft::FftComplex(unsigned long, std::complex<float> const*, std::complex<float>*) (Framework_Fft.cc:40)
==20853==    by 0x55ACDBA1: CPhy_Dft_Ofdm_impl::Phy_Dft_Ofdm_ac(CState_BurstStateParams_impl*) (CPhy_Dft_Ofdm_impl.cc:202)
==20853==    by 0x55B6449C: CState_NonHt_Ltf_Ofdm_impl::Process(unsigned long, std::complex<float>**, CState_BurstStateParams_impl*) (CState_NonHt_Ltf_Ofdm_impl.cc:507)
==20853==    by 0x55B68DFC: CStateHandler_NonHt_Plcp_Ofdm_impl::Process(unsigned long, unsigned long, std::complex<float>**, CState_BurstStateParams_impl*) (CStateHandler_NonHt_Plcp_Ofdm_impl.cc:191)
==20853==    by 0x55805DDB: gr::wifi_ac::CGRHandler_Ofdm_Plcp_impl::handler(boost::shared_ptr<pmt::pmt_base>, std::vector<void*, std::allocator<void*> >) (CGRHandler_Ofdm_Plcp_impl.cc:293)
stevengj commented 2 years ago

And below is what Valgrind tells me about the fftwf leak. Is this an incorrect reporting by Valgrind?

As I said, the plan creation process allocates some memory for cached information. (Indeed, Valgrind is reporting the memory as being allocated by fftwf_plan_dft_1d, which you say you are only calling at the beginning.) This is only de-allocated if you call fftwf_cleanup() (which you would do at the very end of your program, after all of your plans have been deallocated, as explained in the manual). (Of course, at that point there is no point in doing the cleanup except to silence valgrind, since the OS will deallocate everything for you when your program executes.)

Don't call fftwf_cleanup each time you execute an FFT! Do it at the end after you are done with FFTW and have destroyed all your plans!

In any case, this is only done during planning, not during fftwf_execute, so it cannot be responsible for "continual memory growth over time" if you only construct plans once at the beginning of your program.

stevengj commented 2 years ago

PS. No need for a mutex in FftComplex, since fftwf_execute is thread safe. You only need a mutex in planning routines, e.g. in initialize and any destructor.

mwk088 commented 2 years ago

Much appreciated, thank you for the help.