Parallel evaluations of cost function

Armavica commented 5 years ago

Some optimization algorithms require several independent cost function evaluations per iteration. Sometimes (actually most of the times in my applications), evaluating the cost function can be costly, and its evaluations dominate the optimization time. When the cost function itself cannot be parallelized, an obvious improvement is to evaluate it at these points in parallel. It could also be desirable to evaluate the cost function and its gradient or hessian in parallel when applicable.

rayon is the perfect library to allow this kind of things. Would it be okay to add it as a dependency? As a feature? Should we create an option to enable parallel evaluation on demand? We could probably even decide automatically whether it is worth to enable it or not, depending on the duration of the first few evaluations.

stefan-k commented 5 years ago

Sure, I personally don't mind additional dependencies. We can think about having parallel execution behind a feature gate because the only thing I think is important, is that whether something runs in parallel or not remains the choice of the user. If I want to run thousands of optimizations on a cluster or if my cost function implementation is parallelized, I often don't want the evaluations to run in parallel. This doesn't neccessarly need to be handled by a feature gate though, it could also be solved in another way.

At which level would you perform the parallelization? Maybe extending the ArgminOp trait with the methods apply_par, gradient_par and so on makes sense? They could be automatically implemented like this (I'm not sure if this compiles):

fn apply_par(&mut self, params: &[Self::Param]) -> Result<Vec<Self::Output>, Error> {
    params.par_iter().map(|p| self.apply(p)?).collect()
}

This would give automatic parallelization, but it would also leave the implementer of the ArgminOp trait the choice of changing it if necessary. It could look like this behind a feature gate:

fn apply_par(&mut self, params: &[Self::Param]) -> Result<Vec<Self::Output>, Error> {
    #[cfg(feature = "rayon")]
    {
        params.par_iter().map(|p| self.apply(p)?).collect()
    }
    #[cfg(not(feature = "rayon"))]
    {
        params.iter().map(|p| self.apply(p)?).collect()
    }
}

Similarly we could have another compute_all_the_things (name is open for suggestions ;)) method which computes cost, gradient and Hessian in parallel (or a subset of them, depending on what is needed for a particular solver).

Automatic decision of whether parallelization makes sense or not also sounds like a good idea, given that we already measure the time of each iteration. But as a user, I'd also like to be able to opt out of this.

What do you think about this? I have to admit I haven't thought too much about this yet and my experience with rayon is rather limited.

TheIronBorn commented 3 years ago

I'm trying to implement Parallel Tempering with Argmin and something like this would make it much easier

TheIronBorn commented 2 years ago

I was playing with this myself and it seems really easy to implement thanks to the way collect handles Result

#[cfg(feature = "rayon")]
fn apply_par(&self, params: &[Self::Param]) -> Result<Vec<Self::Output>, Error> {
    params.par_iter().map(|p| self.apply(p)).collect()
}

#[cfg(feature = "rayon")]
fn modify_par(
    &self,
    params: &[Self::Param],
    extents: &[Self::Float],
) -> Result<Vec<Self::Param>, Error> {
    params
        .par_iter()
        .zip_eq(extents)
        .map(|(p, e)| self.modify(p, *e))
        .collect()
}

And solvers can use a default sequential trait version if the user didn't implement it

stefan-k commented 2 years ago

Thanks for having a look at this! I like this approach; however, if we extend the ArgminOp trait with apply_par and modify_par then I think the rayon feature gate should be moved inside the default implementation of the methods:

fn par_apply(&self, params: &[Self::Param]) -> Result<Vec<Self::Output>, Error> {
    #[cfg(feature = "rayon")]
    params.par_iter().map(|p| self.apply(p)).collect()
    #[cfg(not(feature = "rayon"))]
    params.iter().map(|p| self.apply(p)).collect()
}

fn par_modify(
    &self,
    params: &[Self::Param],
    extents: &[Self::Float],
) -> Result<Vec<Self::Param>, Error> {
    #[cfg(feature = "rayon")]
    params
        .par_iter()
        .zip_eq(extents)
        .map(|(p, e)| self.modify(p, *e))
        .collect()
    #[cfg(not(feature = "rayon"))]
    params
        .iter()
        .zip_eq(extents)
        .map(|(p, e)| self.modify(p, *e))
        .collect()
}

Then it is not necessary to deal with the feature gate in the solvers. Also, in contrast to my previous suggestion, I would prefer par_{apply,modify} over {apply,modify}_par since it is more in line with iter and par_iter. @TheIronBorn what do you think? Would you be willing to provide a PR for this?

TheIronBorn commented 2 years ago

I looked into that but it seemed it adds more trait requirements like Send/Sync. I don't know how reasonable that is for sequential projects. Maybe there's some trait magic we could do.

Also it might be good to let the user choose which to parallelize since it might be only the cost function or the jacobian/etc which is expensive and so parallelizing everything might slow things down.

If the final version doesn't end up parallel by default perhaps we should reduce the name to {apply/etc}_many or bulk_{apply/etc}, though that might make its use-case less obvious.

stefan-k commented 2 years ago

Those are all very good points. How do you feel about implementing the sequential (=non parallelized) version by default, and leaving the parallel implementation to the user if needed (for now)? Then users can choose how they want to parallelize their code. In that case, bulk_{...} is probably the better choice for the method names. I feel this would be a rather temporary solution until we come up with something more convenient, but it would at least allow users to provide a parallelized version if they want to.

TheIronBorn commented 2 years ago

Yeah I'll do that

stefan-k commented 2 years ago

Implemented in #219 .

The rayon feature enables parallel evaluation of cost function, gradient, operator, Jacobian and Hessian via the bulk_* methods. One can still opt out of parallelization for individual problem related traits. Say you don't want to compute the Hessian in parallel: by implementing the parallelize method of the Hessian trait to return false, it will be computed sequentially.

The bulk_* methods can of course also be overwritten.

The only solver which uses this so far is Particle Swarm Optimization. I'm happy to hear feedback on the design, in particular if anyone is using this for their own solvers.

argmin-rs / argmin

Parallel evaluations of cost function #6