Open aszepieniec opened 3 months ago
It seems that par_interpolate
can be made faster for some domain sizes, e.g. $2^{10}$, if evaluation uses "naive parallelism", i.e.:
impl Polynomial<FF: FiniteField> {
fn parallel_naive_evaluate(&self, domain: &[FF]) -> Vec<FF> {
domain.par_iter().map(|x| self.evaluate(x)).collect_vec()
}
}
See commit message in fd5add7860eacba2b564f7a89a54c648b1475079 for more info.
Function
par_interpolate
has weird behavior for small domain sizes. In particular, it is faster when (some of) its subroutines are sequential.There is a lot of potential for optimization here. In general it is okay to rely on dispatcher methods that choose the asymptotically superior or concretely superior algorithm depending on some threshold, but in the context of parallel hardware we ideally want hardcoded thresholds to be independent of the number of cores/threads. It is allowable to call
available_parallelism
and make a decision based on that. This task involves finding the optimal cascade of specialized functions and the optimal dispatch criteria.