[Perf] Use heuristics to avoid allocations in Sanitizer::str_till_eol

AleoNet / snarkVM

A Virtual Machine for Zero-Knowledge Executions

https://snarkvm.org

Apache License 2.0

1.08k stars 1.5k forks source link

[Perf] Use heuristics to avoid allocations in Sanitizer::str_till_eol #2563

Closed vicsn closed 1 week ago

vicsn commented 1 month ago

Motivation

Replaces https://github.com/AleoNet/snarkVM/pull/2517

Original PR message:

(transferred from https://github.com/ProvableHQ/snarkVM/pull/6, and asking @d0cd for a review as suggested there)

The Sanitizer is used very prominently in our parsing functions, and it is also a source of many allocations, most of which are temporary and avoidable.

The potential perf improvements are quite large, and I've measured them both with a 15-minute run of a --dev node and using hyperfine on a small binary that parsed all the valid .aleo programs currently present in the snarkVM codebase.

dev node:

all allocs are down ~15%, of which almost all are temporary
in Program::from_str specifically, allocs are reduced by ~64%, of which temp allocs ~88%

parsing all .aleo programs using Program::from_str:

allocs are reduced by ~41%
growth reallocs are down ~70%
runtime is reduced by ~31%

Test Plan

CI run link

ljedrz commented 2 weeks ago

@vicsn I updated the original branch with 2 commits addressing the notes on extra docs and the redundant use of eol. Feel free to cherry-pick those, they should apply cleanly.

@d0cd as for extra tests, note that str_till_eol is only ever used in parse_comment, and there are already several applicable test cases for it, including ones that would utilize the faster approach. Do you feel like those are missing any variants?

d0cd commented 2 weeks ago

@ljedrz on initial scan, I don't see test case for a multi-line comment, multiple single line comments, multiple multi-line comments, and interleavings of the two. I'll leave it to your discretion, but I would at least recommend a multi-line comment and multiple single line comments.

I also saw that you use eoi in the update. Looks good to me, but I would also recommend explaining in a comment that this is valid because before is a single line string.

The reason I am being so insistent on comments is that these parsers are quite sensitive and not many people have a deep expertise in nom.

ljedrz commented 2 weeks ago

@vics I addressed the comments in 2 new commits:

ljedrz commented 1 week ago

While not a big deal (extra test cases, docs, and one optimization), this PR was still missing the 4 extra commits mentioned at the end. I can include them in a follow-up shortly.

vicsn commented 1 week ago

@ljedrz apologies I overlooked that I had to update my branch