TritonVM / triton-vm

Triton is a virtual machine that comes with Algebraic Execution Tables (AET) and Arithmetic Intermediate Representations (AIR) for use in combination with a STARK proof system.
https://triton-vm.org
Apache License 2.0
241 stars 36 forks source link

Barycentric Evaluation #266

Closed aszepieniec closed 5 months ago

aszepieniec commented 5 months ago
aszepieniec commented 5 months ago

Previously, the last polynomial in FRI was computed by interpolating the last codeword using an iNTT. This step has been eliminated. Even though the asymptotic complexity dropped from $O(n \log n)$ to $O(n)$ I observed no performance difference. This might change with the length of the last codeword (for instance, if the expansion factor changes). That said, the selling point is the anticipated lower clock cycle count for the recursive verifier.

Sword-Smith commented 5 months ago

Consider rewriting the domain calculation in the barycentric_evaluate to use scan. This removes acc from the scope of the function body.

let generator = BFieldElement::primitive_root_of_unity(codeword.len() as u64).unwrap();
let domain = (0..codeword.len())
    .scan(BFieldElement::one(), |acc, _| {
        let omegai = *acc;
        *acc *= generator;
        Some(omegai)
    })
    .collect_vec();
Sword-Smith commented 5 months ago

The barycentric_evaluate function can also be rewritten to function entirely without heap allocation, which is how I intend to implement it in tasm-lib. Also notice the use of type BFieldElement for domain_iter_elem.

    pub fn barycentric_evaluate(
        codeword: &[XFieldElement],
        indeterminate: XFieldElement,
    ) -> XFieldElement {
        let root_order = codeword.len().try_into().unwrap();
        let generator = BFieldElement::primitive_root_of_unity(root_order).unwrap();
        let mut numerator = xfe!(0);
        let mut denominator = xfe!(0);
        let mut domain_iter_elem = bfe!(1);
        for code_word_elem in codeword {
            let domain_shift_elem = indeterminate - domain_iter_elem;
            let domain_over_domain_shift_elem = domain_iter_elem.lift() / domain_shift_elem;
            numerator += domain_over_domain_shift_elem * *code_word_elem;
            denominator += domain_over_domain_shift_elem;
            domain_iter_elem *= generator;
        }

        numerator / denominator
    }

Due to the batch-inversion being used on the host machine, I can't tell which one is faster though. And for our domain sizes I'm pretty sure either implementation is maximum a few microseconds, so feel free to ignore this suggestion entirely and file it under "Red Herring".