Neptune-Crypto / twenty-first

Collection of mathematics routines and cryptography for the twenty-first century
GNU General Public License v2.0
72 stars 21 forks source link

Optimize Parallel Version of Interpolate #227

Open aszepieniec opened 1 month ago

aszepieniec commented 1 month ago

Function par_interpolate has weird behavior for small domain sizes. In particular, it is faster when (some of) its subroutines are sequential.

There is a lot of potential for optimization here. In general it is okay to rely on dispatcher methods that choose the asymptotically superior or concretely superior algorithm depending on some threshold, but in the context of parallel hardware we ideally want hardcoded thresholds to be independent of the number of cores/threads. It is allowable to call available_parallelism and make a decision based on that. This task involves finding the optimal cascade of specialized functions and the optimal dispatch criteria.

Sword-Smith commented 1 month ago

It seems that par_interpolate can be made faster for some domain sizes, e.g. $2^{10}$, if evaluation uses "naive parallelism", i.e.:

impl Polynomial<FF: FiniteField> {
    fn parallel_naive_evaluate(&self, domain: &[FF]) -> Vec<FF> {
        domain.par_iter().map(|x| self.evaluate(x)).collect_vec()
    }
}

See commit message in fd5add7860eacba2b564f7a89a54c648b1475079 for more info.