Make AvxFftPlanner: Send+Sync

cassiersg commented 3 years ago

Why making FftPlanner: Send+Sync ?

Making FftPlanner: Send+Sync simplifies the usage of a single FftPlanner in a large chunk of an multi-threaded application, hence benefitting of the cache across threads. This allows for instance to put the FftPlanner in a global variable (e.g. using lazy_static!), which is may be useful even if the application is actually not multi-threaded.

There is currently no performance cost in making FftPlanner: Send+Sync, and other parts of the APIs already pay some cost in order to allow multi-threading (e.g. FftPlanner.plan_ftt that returns an Arc).

Changes

make AvxFftPlanner: Send+Sync by adding a Send+Sync bound on the trait AvxPlannerInternalAPI declaration.
make FftPlannerScalar: Send+Sync by replacing the internal Rc<Recipe> by an arena (implemented as a simple Vec, but this can be easily changed to any of the arena crates).
add a unit test that FftPlanner: Send+Sync

ejmahler commented 3 years ago

I would have expected that send+sync doesn't matter for the planners, because all of their functions are &mut, so you have to wrap them in a mutex if you're gonna use them across threads.

Could you give an example of how you use this? I'm on board with supporting it, I just want to make sure I understand what the requirement is.

cassiersg commented 3 years ago

The main usage is indeed probably to wrap the planner in a Mutex. Let us consider the common pattern of multi-thread ref-counted sharing through Arc<Mutex<FftPlanner>>: std::sync::Mutex has the following impls:

impl<T: ?Sized + Send> Send for Mutex<T>
impl<T: ?Sized + Send> Sync for Mutex<T>

while for std::sync::Arc:

impl<T> Send for Arc<T> where  T: Send + Sync + ?Sized

and you want the Arc to be Sen to be able to send it to another thread. Another common case would be to put a Mutex<FftPlanner> in a lazy_static, which requires the Mutex to be Sync.

This makes the case for FftPlanner: Send, but not for FftPlanner: Sync. As you mention, there is no current usage to Sync since the methods on FftPlanner use &mut self. It is possible to remove the impl Sync as an API-stability future-proofing measure (the most robust technique I know of for this is to include a PhantomData<Cell<()>> field in FftPlanner), but I do not see a need for doing that.

ejmahler commented 3 years ago

If the use case is putting the planner inside a mutex, that's definitely something I'd like to support. The best way might to guarantee that might be to make a small test in tests/accuracy.rs that just declares FftPlannerScalar, FftPlannerAvx, and FftPlanner inside of mutexes, so that if they aren't send, the test fails to compile.

If there isn't a use case for sync, I'd rather not commit to it right now. I don't know what being send but not sync would even enable the planner to do, but it seems best not to tie our hands if we don't have to.

cassiersg commented 3 years ago

If the use case is putting the planner inside a mutex, that's definitely something I'd like to support. The best way might to guarantee that might be to make a small test in tests/accuracy.rs that just declares FftPlannerScalar, FftPlannerAvx, and FftPlanner inside of mutexes, so that if they aren't send, the test fails to compile. This is unfortunately not enough since you can build a Mutex with a !Send type, but it is actually impossible to use it across threads (which is arguably quite useless). However, we can make a test that fails to compile if a type is !Send. I added such a test in plan.rs for FftPlanner.

If there isn't a use case for sync, I'd rather not commit to it right now. I don't know what being send but not sync would even enable the planner to do, but it seems best not to tie our hands if we don't have to. The main types that are Send but not Sync in the standard library are Cell and RefCell. To prevent a type from implementing Sync, the best (but imo hacky) way I know is to add a field _non_sync: PhantomData<Cell<()>> to the struct. (I do not know if this is a common/recommended pattern in rust...).

ejmahler commented 3 years ago

I just realized one more thing - one thing we're losing with this new setup is the ability to dbg!() a recipe and get the full tree.

I think we should challenge our assumption that the overhead of Arc is a major component of FFT planning time. If it isn't, we could switch the Rc to an Arc, have a much smaller diff, and still get dbg!() support for debugging plans

HEnquist commented 3 years ago

I just realized one more thing - one thing we're losing with this new setup is the ability to dbg!() a recipe and get the full tree.

I think we should challenge our assumption that the overhead of Arc is a major component of FFT planning time. If it isn't, we could switch the Rc to an Arc, have a much smaller diff, and still get dbg!() support for debugging plans

Oh right, losing the ability of debug printing a full recipe is a major disadvantage! The main idea behind the recipes was to make debugging and testing easier.

cassiersg commented 3 years ago

The Arc cost is probably not too annoying. At least, it looks to me that since the actual Fft contains as many Arc's as the recipe does, putting Arc's everywhere in the Recipe is at most doubling the Fft construction cost.

I do not know the development history and I don't know the kind of issues that have to be often debugged, but IMHO the new Recipe code has the advantage of making explicit the invariant that there is only one Recipe for each len. That Recipes are explicitly composing Fft's of smaller len makes the Recipe non-recursive. This makes Recipe's correctness a much more local property: it should be correctly composing smaller len's. The correctness of the full FFt recursive build naturally results from all Recipes being locally correct. An example in the way this simplifies the code is that we don't need the Recipe::len function anymore, which was somewhat redundant information.

Nevertheless, as I'm not very knowledgeable in this crate, I you are convinced that Arc'ing the Recipe's is the way to go, I'll be happy implement this in the PR.

ejmahler commented 3 years ago

You make a good point about the invariant. Let me think about it.

ejmahler commented 3 years ago

After thinking about, I realized something that will factor in: If you take a look at the "estimating planner" PR, it adds a cost() method to Recipe - the idea being that you can hypothetically construct several recipes and compare their costs, and choose the recipe with the lowest cost.

If we compare how easy that will be to implement with the index approach vs the Arc approach, i think the Arc approach will be much cleaner.

Based on that, I think we should go with the Arc approach.

ejmahler commented 3 years ago

The invariant is a problem, so i'm glad you pointed it out, and we should make sure to fix it - but that fix doesn't have to be a part of this PR

cassiersg commented 3 years ago

@ejmahler This PR now implements the Arc strategy, which is indeed trivial. Let's take the easy path first.

I did not look at the "estimating planner" PR in depth, but as long as the invariant is valid, the len-as-id solution should work: you build multiple Recipe's compute their costs, and store in the HashMap only the one with the lower cost. The invariant means that you will later always want to use that one (and not the more costly one that you can destroy immediately).

For future reference, here is the commit with the approach based on using the len as the identifiers and storing all Recipe in an HashMap.

ejmahler / RustFFT

Make AvxFftPlanner: Send+Sync #55

Why making FftPlanner: Send+Sync ?

Changes