FFTW / fftw3

DO NOT CHECK OUT THESE FILES FROM GITHUB UNLESS YOU KNOW WHAT YOU ARE DOING. (See below.)
GNU General Public License v2.0
2.7k stars 660 forks source link

Thread Safe Planner #16

Closed x42 closed 8 years ago

x42 commented 10 years ago

http://www.fftw.org/fftw3_doc/Thread-safety.html mentions

We do not think this should be an important restriction

I hope I can convince you otherwise.

Problem Description

fftw3 is used by a variety of audio plugins.

Those plugins are loaded into the host's memory-space (usually an audio workstation). The host has limited control of what the plugin does internally, and the plugins do not know about each other.

There is no way to ensure that two independent plugins which are linked against libfftw do not run the shared planner simultaneously. Nor is there a possibility to control this on host application level.

When two independent plugins create fftw plans the application usually segfaults or similar undesired effects manifest.

Possible solutions for this include:

  1. Statically link plugins against libfftw. Every plugin will have its own copy. The plans are not shared with other plugins (which is mostly fine). This still requires a bit of special attention: (fftw symbol visibility needs to be overridden for static links and the plugin must protect its planning routings for multiple instances of itself). Furthermore distributors must honor that (special built of fftw + static link). -- It is very unlikely that both plugin-authors and various gnu/linux-distributors do get this right (most distros dislike static linking) for the growing number of audio-plugins using fftw.
  2. process separate all plugins in the host. That is not a viable option for Digital Audio Workstations where low-latency is important, context switches (particularly realtime thread) heavy and inter-process communication does not scale (compared to shared memory), especially so if the DAW does not limit audio track or channel count.
  3. Discourage use of fftw for audio-plugins or even refuse to load plugis using it in the host. -- not the best idea :)
  4. Ship a special (ABI compatible) build of libfftw with the host application which protects the planner. Plugins in the same memory space will use the already loaded library. This requires patching libfftw, but when doing so... why not do it upstream directly. Otherwise it has similar issues as (1).

    Discussion

The issue at hand is not limited to audio-application, there are likely other applications with similar problems out there (gnu-octave comes to mind, but I don't know for certain).

As the Thread-safety page mentions, it's as simple as

wrap a semaphore lock around any calls to the planner

Is there some good reason why libfftw does not do this by default?

Existing applications should not be affected by this (they're not supposed to call the planner from different threads), but that change would make all the difference for multi-threaded plugin hosts.

I suppose it could be a bit of work to wrap all planner entry-points with a semaphore, yet there may be a neat simple solution using #define.

I'll be happy to look into this, but before going that way, I'd like to ask if such a change would be accepted by fftw or if there is an even better solution planned for future version that will make fftw's planner thread-safe.

yours truly, robin - for the linux-audio community and for himself


Notable audio plugins using fftw3: http://calf.sourceforge.net/ http://factorial.hu/plugins/lv2/ir http://guitarix.sourceforge.net/ http://breakfastquay.com/rubberband/ http://plugin.org.uk/ http://zynaddsubfx.sourceforge.net/ https://github.com/x42/meters.lv2 ...

Notable affected plugin hosts: http://ardour.org/ http://qtractor.sourceforge.net/ https://github.com/falkTX/Carla/ ...

see also https://community.ardour.org/node/8271

jrigg commented 10 years ago

As an audio engineer who relies on this stuff in my work, I'll add my voice to the request to consider this issue. It's been causing serious problems here.

stevengj commented 10 years ago

I'm not opposed to this. There's no real performance reason not to put a mutex lock around the planner etc. as far as I can tell.

We might try to be bit careful with library dependencies, because I don't want libfftw3 to be suddenly dependent on libpthread. But that is manageable: we can just have fftw_init_threads install hook functions pointers (initially NULL) in the planner for locking/unlocking a mutex. This would not really solve the problem in general, unfortunately.

stevengj commented 10 years ago

Probably we should just bite the bullet and add a libpthread dependency to libfftw3 (on Unix). For the use-case here, it seems like you need thread-safety regardless of whether the threaded FFTW is being used.

(When FFTW was created back in 1997, there were still a wide variety of threading libraries that we needed to support. pthreads, Solaris threads, Mach threads.... But now life is simpler because there are only two that matter, pthreads and Windows threads, so adding a dependency on one is not so bad.)

stevengj commented 10 years ago

On Unix, if we add a libpthread dependency we can use a static mutex initializer. On Windows, it looks like there is a hack to accomplish similar functionality.

x42 commented 10 years ago

There's already pthread/posix and Windows specific mutex code in threads/threads.c - so no new dependency is needed, is it?

stevengj commented 10 years ago

@x42, the trick here is that we need a single global mutex lock on the global planner state, and that global mutex lock needs to get initialized exactly once, in a thread-safe way, regardless of how many threads are linking to FFTW. That's why you need to use a static initializer or an emulation thereof.

Also, the threads/threads.c functions are only used in libfftw3_threads, not in libfftw3. The former currently has a libpthread dependency (on Unix), but not the latter.

x42 commented 10 years ago

OK. so the equivalent to PTHREAD_MUTEX_INITIALIZER for windows (and other non posix OS).

stevengj commented 10 years ago

Fortunately, the only non-POSIX system that matters these days is Windows. (And of course our autoconf script can just disable the mutex on non-POSIX non-Windows systems, if configure runs at all.)

The unfortunate thing here is that our Unix linking instructions, which haven't changed in over a decade, will now need to be changed: programs will need to link -lfftw3 -lm + whatever is needed to link pthreads. Linking pthreads used to be insanely complicated and non-portable, but maybe everything just uses -pthread now. (Or rather, all the other Unices have mostly died off.)

Still, it will potentially break a lot of people's Makefiles. (Although as long as they are linking the shared-library version of FFTW, I guess the shared-library dependency mechanism on GNU/Linux will take care of it silently for them. People who compile libfftw3.a and link it statically are in for a surprise, though.)

x42 commented 10 years ago

..no real performance reason..

indeed. the overhead of locking/unlocking a mutex is negligible.

..Unix linking instructions..

-pthead is a compiler/linker flag understood by almost all compilers these days, but yes: LD_FLAGS will have to change. Alternatively a --no-thread-safe-planner configure option could be added.

stevengj commented 10 years ago

It's not so negligible that I would want locking/unlocking in an inner loop. Moreover, we are talking about a single global lock here, which forces serialization, so you wouldn't want this for any performance-critical function like plan execution. But none of the functions that need locking here are inner-loop functions (plan execution is already documented to be thread-safe), so it doesn't matter.

We should certainly have a --disable-threadsafety flag as an option, but it will still require people to change their builds if they want to use it; most users would be better off just adding -pthread to their makefile.

x42 commented 10 years ago

in that case, just adding it to fftw.pc.in may do the trick

stevengj commented 10 years ago

Partial(?) list of functions that need locking, for future reference:

stevengj commented 10 years ago

@x42, we will certainly add it to fftw.pc, but not everyone uses pkgconfig.

matteo-frigo commented 10 years ago

Would something like this solve the problem:

This "solution" works as long as any thread that depends upon a thread-safe planner calls init_threads() first.

stevengj commented 10 years ago

I guess the question is whether it is feasible to expect FFTW-using audio plugins (and similar) to link -lfftw3_threads and call fftw_init_threads(). It doesn't seem crazy to me, but I don't know the audio-plugin space.

x42 commented 10 years ago

What is the problem with libfftw becoming dependent on pthread on unices that have pthread by default?

The only downside I see is that some single-threaded applications would have to add -pthread (or -lpthead) to their link flags and most won't notice nor care if it's picked up via pkg-config or autoconf et al anyway.

As for the implementation: How about adding abstract fftw_mutex_un/lock() methods to libfftw itself and calling those at every planner entry and exit point? On GNU/Linux and OSX (and other unices which have pthread by default) fftw_mutex_un/lock() will simply expand to pthread calls at compile-time, on Windows it'll call InterlockedCompareExchangePointer etc. On small embedded or other systems it would may become a noop (or we can add a spinlock or a custom mutex implementation).

If pthread by default is a problem. This approach would also lend itself to create libfftw3, libfftw3f, libfftw_thread, libfftw3f_thread. All a plugin author would have to do is change the lib-name to link against (no code changes needed).

-=-

A solution that does not require plugin code to change is preferable.

There are a lot more plugins than hosts and plugins are killing us already ("Look Ma, I made a new plugin.." crash). Host authors are usually more experienced and conscientious. The issue at hand is rarely caught by plugin-authors, since it only manifests when multiple instances are run or different plugins are combined in a host.

However, if fftw3_threads and fftw3 are ABI compatible and every planner call does preemptively call init_threads() (inside fftw3_threads) it would still work for cases where the plugin-author takes no action: On Linux and OSX: the host links against it plugins inherit its symbols. On Windows it depends (linking vs GetProcAddress() - but that's a different issue).

stevengj commented 10 years ago

The problem is potentially requiring tens of thousands of people to change their Makefiles, which is not a step we take lightly. But I'm also very sympathetic to your argument that plugin authors may not be sophisticated, and FFTW should be safe by default if possible. (At least failing to include -pthread will lead to a "noisy" failure.)

matteo-frigo commented 10 years ago

Would something like this work:

x42 commented 10 years ago

Yes, that sounds like a plan and an elegant solution.

Theoretically that would also allow for libfftw to set those callbacks itself for some platforms, one day.

It does not define whose responsible it is to set those functions. In case the plugin-host itself does not use fftw, but only independent plugins (which are loaded dynamically) use it, that could still fall over, but in the context of pro-audio, the host should link against libfftw just to set the callbacks preemptively.

A suggestion for the API: it should be possible to set both hooks atomically (just in case two independent threads try at the same time). Besides, it does not really make sense to install only one hook.

typedef void (*FFTWPlannerLockCallback)(void *arg);
/**
 * set before_planner_hook and after_planner_hook callback
 * functions if they were not previously set.
 *
 * @param b callback function to be called before starting a fftw plan or NULL to unset.
 * @param a callback function to be called after completing a fftw plan or NULL to unset.
 * @param replace if nonzero, set callbacks even if hooks are already defined.
 * @return 0 if hooks were installed, otherwise error-code 
 */
int fftw_set_planner_hooks (
        FFTWPlannerLockCallback b,
        FFTWPlannerLockCallback a,
        int replace,
        void *arg
);
ReTenant commented 9 years ago

ping!

falkTX commented 9 years ago

any news on this?

matteo-frigo commented 9 years ago

See 113e1086966fdff4c172672753cc880e6bc74d3d

I am still missing the arg parameter to the two hooks. Do we want one or two args?

jrigg commented 9 years ago

Has there been any more progress on this?

harryhaaren commented 9 years ago

Hi All,

A ping on this issue - this issue is discussed periodically on IRC in various linux-audio related channels, and would be really nice to have fixed.

I'm currently looking towards using a different library for FFT for an upcoming project due to this issue possibly causing crashes in software designed for on-stage live performance.

The solution proposed with both callbacks set in one function looks a good fix to me. Cheers, -Harry

magnetophon commented 9 years ago

I'll add my voice to the choir! :)

rdolbeau commented 9 years ago

WRT a simple global lock - there's no need for pthread or anything heavy. A simple by-the-book lock should be enough (... I think). GCC defines intrinsics for atomic operations (and they are available on most architectures), so it's quite trivial to implement a ticket lock on a pair of statically-initialized to 0 integer. I've pushed a trivial example on https://github.com/rdolbeau/fftw3/tree/mutualexclusion. Just add "--enable-mutualexclusionplan" to configure, and the planner should be "thread safe" (as in, serialized :-). Beware: active wait, and not really tested at all.

matteo-frigo commented 9 years ago

Current status on this issue:

The next fftw-3.3.5 will support fftw_set_planner_hooks(before, after, arg). Before planning, fftw calls before(arg), and after planning it calls after(arg).

fftw_set_planner_hooks() is not thread safe; the user is responsible to ensure that it is called once. @rdolbeau : it is a bad idea to add a gcc dependency (__sync* intrinsics) so deep in FFTW---whatever the API we end up with, we must support it everywhere (including e.g. Windows without pthreads).

Please offer feedback on whether or not this solves the problem. If it does I'll document it in the manual, and if it does not I will remove the hook completely.

rdolbeau commented 9 years ago

@matteo-frigo The __sync* intrinsics are supported by ICC and CLANG as well. For windows, there is a similar mechanism (InterlockedAdd ?). Basically, you only need a way to do "x = * y; (* y)++" atomically to implement a ticket lock. And it doesn't require any library support beyond the compiler itself, so it avoids a dependency on pthread. Hooks can work, but require application-level support.

x42 commented 9 years ago

A hook for locking offers a nice flexible API, but that requires each and every application to explicitly use it. Also in the Host+Plugins case it can quickly end up being a mess: Who is responsible to register the hooks? (e.g. A host is not using fftw, but some plugins loaded into the host do. Plugins can be un/reloaded.)

I'm all for hooks, but they should by default point to an internal simple lock (a ticket spinlock is fine. I like Romain's implementation.)

The reasoning is as follows:

matteo-frigo commented 9 years ago

The spinlock implementation is a bad idea for at least two reasons.

First, the planner may take minutes, and the last thing you want to do is spin while the planner is running.

Second, we aim to be much more portable than "icc" and "clang". Look at kernel/cycle.h for a partial list of systems where the lock hack is expected to work.

The current status is that the fftw core (e.g., non-simd) requires ANSI-C and nothing else (no threads, no c99, no nothing), and the FFTW API works in the same way in all systems. It will take an extraordinary amount of evidence to convince us to break this property.

matteo-frigo commented 9 years ago

How about this.

To the threaded api, we add a call fftw_make_planner_thread_safe() (or something) that atomically installs the appropriate hooks, and is guaranteed to be atomic and idempotent.

Plugins that care about thread safety will have to call fftw_make_planner_thread_safe(). The fftw core does not depend upon pthreads.

If we go down this path, I will remove the hooks for lack of a use case.

davisking commented 9 years ago

Why is there global state in FFTW at all? By that I mean, why doesn't the interaction with FFTW begin by allocating some kind of context structure which is passed to all the other FFTW functions and contains the "global" state. This would avoid any need for behind the scenes locks.

matteo-frigo commented 9 years ago

The problem is not to argue what made or didn't make sense in 1997 when fftw started. The problem is to decide how to move forward productively from where we are.

On Mon, May 25, 2015 at 5:59 PM, Davis E. King notifications@github.com wrote:

Why is there global state in FFTW at all? By that I mean, why doesn't the interaction with FFTW begin by allocating some kind of context structure which is passed to all the other FFTW functions and contains the "global" state. This would avoid any need for behind the scenes locks.

— Reply to this email directly or view it on GitHub https://github.com/FFTW/fftw3/issues/16#issuecomment-105324764.

davisking commented 9 years ago

Obviously don't break the existing API. But, for example, why not add a version of fftw_plan_dft_1d() that doesn't have any global state? Call it fftw_plan_dft_1d_r() and make it either take an additional context object or make it simply allocate and deallocate whatever it needs internally.

x42 commented 9 years ago

fftw_make_planner_thread_safe() is so far the best option on the table. It still requires application level support but that may be an acceptable compromise.

Why can't this be done by default if fftw3 is compiled with --enable-threads?

x42 commented 9 years ago

@davisking fftw_plan_dft_1d and fftwf_plan_r2r_1d and fftwf_plan_dft_r2c_1d and probably a few more... special casing a single plan is not a solution.

A similar workaround is just copy all of fftw into your project (the global state won't be shared with other projects that use system-wide libfftw.so, you only have to worry about multiple instances of your own project which is trivial given that you have control over the fftw configuration as part of your project). Then again almost all gnu/linux distros refuse to package a project like that and instead ask to get things upstream..

davisking commented 9 years ago

Yes, I mean do it for all the planners. Everything else we are talking about are a bunch of hacks to work around the problem imposed by a lack of reentrant planning functions in the API. As you and others have pointed out, the other solution involves encouraging fftw users to do things they will either refuse to do or will complicate their build systems and likely lead to package managers (or other downstream users) accidentally building things without the appropriate thread safety switches turned on.

Is it really that hard to push the global variables used by the planners into an argument? I haven't looked at FFTW's code but I can't imagine it's a huge deal to refactor it into that form. Then you implement the existing non-thread-safe APIs in terms of the reentrant APIs to avoid breaking backwards compatibility.

x42 commented 9 years ago

@matteo-frigo Does the problem also exist if fftw is built without --enable-threads but an application calls fftwplan* from different threads?

matteo-frigo commented 9 years ago

@x42 (et al.):

Whether this is a "problem" or not depends on your perspective.

FFTW is meant primarily to be called from one thread and use threads internally to speed up the computation. This is a common mode of operation of numerical libraries: you write your sequential program that calls fftw/atlas/whatever, and the numerical library handles the parallelism. The core of FFTW consists of one library that does not depend on pthreads (e.g., because the user may want to use openmp instead of pthreads, or not want threads at all). --enable-threads produces another library libfftw3_threads that exports a function fftw_init_threads() that installs the threaded solvers into the fftw core. Such a library is optional; --enable-openmp installs openmp solvers instead (and other awful variants exists; for example support for the Cell processor adds solvers that use the coprocessors).

Thus, the intended usage of fftw is the opposite of what the commenters on this ticket have in mind, that is, calling fftw from multiple threads and run fftw independently in each thread.

The root of the problem is that it is hard to have both setups. Since fftw --enable-threads uses threads internally, the internal threads are a shared resource that is hard to manage if we allow multiple threads to enter the planner at the same time. To put it another way, threads do not compose. Attempts to provide a better abstraction parallelism (such as Cilk or TBB) have not won in the marketplace, so we are stuck with the problems of threads.

I think that the suggestion of passing a planner "object" to all planning function is fundamentally misguided because it attacks the syntax of the problem but not its essence. At its root, threaded fftw has a bunch of global resources (internal threads) that cannot be abstracted away via syntactic sugar. Thus, this suggestion is fundamentally incompatible with --enable-threads. (For the record, this suggestion would not be hard to implement. fftw internally does in fact pass an explicit planner "object" pointer everywhere in the way suggested by davisking. The API layer deliberately uses a global planner precisely not to sweep the real problem under the rug.)

I think that my proposal of fftw_make_planner_thread_safe() is the only practical workaround: it recognizes that threads do not compose, that fftw is set up so that parallelism is inside fftw and not outside, it works around the lack of composability via a lock, and it does not break any existing code. This proposal is currently implemented in git, so folks may want to try it out and see if it works for them.

x42 commented 9 years ago

Nice, I can confirm that fftwf_make_planner_thread_safe() works and resolves the issue.

However, so far there is no C-header for that function, and -lfftw3f_threads need to be added manually. It'd be great if pkg-config --libs fftw3f could take care of that.

The test was performed on debian/stable, fftw3 git

# git describe --tags
fftw-3.3.4-55-g8cd9bfa
# ./configure --enable-single --enable-sse --enable-avx --enable-shared --enable-static --enable-threads

PS. How can one easily check if that function is available at compile time? Is that best done during configuration pkg-config --atleast-version=3.3.5 fftw3f or will there be a header define in FFTW3 itself?

x42 commented 9 years ago

PS. the test I ran is https://gist.github.com/x42/07391de61b99e872ad9d without fftwf_make_planner_thread_safe() there are regular crashes.

and as mentioned previously, after calling fftwf_make_planner_thread_safe() everything works fine.

x42 commented 9 years ago

Any news on this?

looks like all that's left to do is adding fftwf_make_planner_thread_safe to the header and -lfftw3f_threads to the pkg-config libs.

Parabola25 commented 9 years ago

Hi guys I m no programmer and got very few basics skills with linux but I love Ardour and calf plugins. Could someone help me resolve the crash when loading some .ardour saved session? I dont understand how this fftw work..! Thx

matteo-frigo commented 9 years ago

What do you mean by "adding to the header"? fftwf_make_planner_thread_safe() is already in the header fftw3.h:

FFTW_EXTERN void X(make_planner_thread_safe)(void);

On Mon, Sep 7, 2015 at 10:55 AM, Robin Gareus notifications@github.com wrote:

Any news on this?

looks like all that's left to do is adding fftwf_make_planner_thread_safe to the header and -lfftw3f_threads to the pkg-config libs.

— Reply to this email directly or view it on GitHub https://github.com/FFTW/fftw3/issues/16#issuecomment-138318123.

x42 commented 9 years ago

indeed, there it is. sorry for the noise. The include works now. -lfftw3f_threads is still needed though. Even when configured with --enable-threads the .pc file only references -lfftw3 (or -lfftw3f in single mode).

PS. I suppose last time I tested the system-wide fftw3.h was preferred over the the local version (using PKG_CONFIG_PATH).

matteo-frigo commented 9 years ago

We cannot add -lfftw*_threads to the pkg-config file, because this would force all applications to link to the pthreads library. This whole topic was about how to avoid forcing such a dependency.

On Tue, Sep 8, 2015 at 11:29 AM, Robin Gareus notifications@github.com wrote:

indeed, there it is. sorry for the noise. The include works now. -lfftw3f_threads is still needed though. Even when configured with --enable-threads the .pc file only references -lfftw3 (or -lfftw3f in single mode).

PS. I suppose last time I tested the system-wide fftw3.h was preferred over the the local version (using PKG_CONFIG_PATH).

— Reply to this email directly or view it on GitHub https://github.com/FFTW/fftw3/issues/16#issuecomment-138600776.

x42 commented 9 years ago

How do you envisage to solve this? A second pkg-config file fftw3[f]-threaded?

x42 commented 8 years ago

I'll close this. The original issue is fixed in fftw3-git.

It'd be nice if there'd be an easy way to find out if a given version of fftw in a system has fftw3_threads. Best solution I can currently think of is a 2nd .pc pkg-config file, but that's a different story.

Thanks to all involved involved in fixing this issue.

stevengj commented 8 years ago

Doesn't fftw_destroy_plan also need a lock, since it accesses shared state for trig tables?

matteo-frigo commented 8 years ago

@stevengj Yes, it's a bug, I'll fix it