Closed Armavica closed 2 years ago
Sure, I personally don't mind additional dependencies. We can think about having parallel execution behind a feature gate because the only thing I think is important, is that whether something runs in parallel or not remains the choice of the user. If I want to run thousands of optimizations on a cluster or if my cost function implementation is parallelized, I often don't want the evaluations to run in parallel. This doesn't neccessarly need to be handled by a feature gate though, it could also be solved in another way.
At which level would you perform the parallelization? Maybe extending the ArgminOp
trait with the methods apply_par
, gradient_par
and so on makes sense? They could be automatically implemented like this (I'm not sure if this compiles):
fn apply_par(&mut self, params: &[Self::Param]) -> Result<Vec<Self::Output>, Error> {
params.par_iter().map(|p| self.apply(p)?).collect()
}
This would give automatic parallelization, but it would also leave the implementer of the ArgminOp
trait the choice of changing it if necessary.
It could look like this behind a feature gate:
fn apply_par(&mut self, params: &[Self::Param]) -> Result<Vec<Self::Output>, Error> {
#[cfg(feature = "rayon")]
{
params.par_iter().map(|p| self.apply(p)?).collect()
}
#[cfg(not(feature = "rayon"))]
{
params.iter().map(|p| self.apply(p)?).collect()
}
}
Similarly we could have another compute_all_the_things
(name is open for suggestions ;)) method which computes cost, gradient and Hessian in parallel (or a subset of them, depending on what is needed for a particular solver).
Automatic decision of whether parallelization makes sense or not also sounds like a good idea, given that we already measure the time of each iteration. But as a user, I'd also like to be able to opt out of this.
What do you think about this? I have to admit I haven't thought too much about this yet and my experience with rayon is rather limited.
I'm trying to implement Parallel Tempering with Argmin and something like this would make it much easier
I was playing with this myself and it seems really easy to implement thanks to the way collect
handles Result
#[cfg(feature = "rayon")]
fn apply_par(&self, params: &[Self::Param]) -> Result<Vec<Self::Output>, Error> {
params.par_iter().map(|p| self.apply(p)).collect()
}
#[cfg(feature = "rayon")]
fn modify_par(
&self,
params: &[Self::Param],
extents: &[Self::Float],
) -> Result<Vec<Self::Param>, Error> {
params
.par_iter()
.zip_eq(extents)
.map(|(p, e)| self.modify(p, *e))
.collect()
}
And solvers can use a default sequential trait version if the user didn't implement it
Thanks for having a look at this! I like this approach; however, if we extend the ArgminOp
trait with apply_par
and modify_par
then I think the rayon
feature gate should be moved inside the default implementation of the methods:
fn par_apply(&self, params: &[Self::Param]) -> Result<Vec<Self::Output>, Error> {
#[cfg(feature = "rayon")]
params.par_iter().map(|p| self.apply(p)).collect()
#[cfg(not(feature = "rayon"))]
params.iter().map(|p| self.apply(p)).collect()
}
fn par_modify(
&self,
params: &[Self::Param],
extents: &[Self::Float],
) -> Result<Vec<Self::Param>, Error> {
#[cfg(feature = "rayon")]
params
.par_iter()
.zip_eq(extents)
.map(|(p, e)| self.modify(p, *e))
.collect()
#[cfg(not(feature = "rayon"))]
params
.iter()
.zip_eq(extents)
.map(|(p, e)| self.modify(p, *e))
.collect()
}
Then it is not necessary to deal with the feature gate in the solvers. Also, in contrast to my previous suggestion, I would prefer par_{apply,modify}
over {apply,modify}_par
since it is more in line with iter
and par_iter
.
@TheIronBorn what do you think? Would you be willing to provide a PR for this?
I looked into that but it seemed it adds more trait requirements like Send/Sync
. I don't know how reasonable that is for sequential projects. Maybe there's some trait magic we could do.
Also it might be good to let the user choose which to parallelize since it might be only the cost function or the jacobian/etc which is expensive and so parallelizing everything might slow things down.
If the final version doesn't end up parallel by default perhaps we should reduce the name to {apply/etc}_many
or bulk_{apply/etc}
, though that might make its use-case less obvious.
Those are all very good points. How do you feel about implementing the sequential (=non parallelized) version by default, and leaving the parallel implementation to the user if needed (for now)? Then users can choose how they want to parallelize their code. In that case, bulk_{...}
is probably the better choice for the method names.
I feel this would be a rather temporary solution until we come up with something more convenient, but it would at least allow users to provide a parallelized version if they want to.
Yeah I'll do that
Implemented in #219 .
The rayon
feature enables parallel evaluation of cost function, gradient, operator, Jacobian and Hessian via the bulk_*
methods. One can still opt out of parallelization for individual problem related traits. Say you don't want to compute the Hessian in parallel: by implementing the parallelize
method of the Hessian
trait to return false
, it will be computed sequentially.
The bulk_*
methods can of course also be overwritten.
The only solver which uses this so far is Particle Swarm Optimization. I'm happy to hear feedback on the design, in particular if anyone is using this for their own solvers.
Some optimization algorithms require several independent cost function evaluations per iteration. Sometimes (actually most of the times in my applications), evaluating the cost function can be costly, and its evaluations dominate the optimization time. When the cost function itself cannot be parallelized, an obvious improvement is to evaluate it at these points in parallel. It could also be desirable to evaluate the cost function and its gradient or hessian in parallel when applicable.
rayon
is the perfect library to allow this kind of things. Would it be okay to add it as a dependency? As a feature? Should we create an option to enable parallel evaluation on demand? We could probably even decide automatically whether it is worth to enable it or not, depending on the duration of the first few evaluations.