Open jeromesimeon opened 4 years ago
Some initial findings:
grammarAst
in the parserManager
.this.grammarAst
and compiling and creating the parser on demand reduces the memory requirement for each template by ~50MB (needs thorough analysis to confirm figures) /**
* Gets a parser object for this template
* @return {object} the parser for this template
*/
getParser() {
return new nearley.Parser(ParserManager.compileGrammar(this.grammar));
}
Update: I must have done something wrong in my original test, that compilation overhead was suspiciously low. Here are some slightly more rigorous test outputs.
Average computation time for Clause.parse
averaged across 100 sequential test runs:
Parser Type | Template | Parse Time |
---|---|---|
Compiled on demand | Acceptance of Delivery | 104.3ms |
Compiled on demand | Late Delivery and Penalty | 47.6ms |
Precompiled | Acceptance of Delivery | 5.3ms |
Precompiled | Late Delivery and Penalty | 3.7ms |
Precompiled
is the current default behaviour, Compiled on demand
uses the alternative implementation of getParser
from above.
New parser showing promises for addressing the parser side of this problem. Pointers and details in #547 (also, the new approach has negligible compilation overhead)
A quick memory snapshot comparing the old and the new parser.
Old:
Trace:
bash-3.2$ node test.js
Before loading: The script uses approximately 46.93 MB
During run: The script uses approximately 79.07 MB
PARSED {"$class":"org.accordproject.volumediscount.VolumeDiscountContract","contractId":"0f6678a4-2fd7-4d08-8808-5024e3e2b108","firstVolume":1,"secondVolume":10,"firstRate":3,"secondRate":2.9,"thirdRate":2.8}
Average time: 6.498220708370209
After run: The script uses approximately 59.03 MB
After run: The script uses approximately 59.19 MB
During run: The script uses approximately 112.81 MB
PARSED {"$class":"org.accordproject.volumediscount.VolumeDiscountContract","contractId":"d1798965-1500-4975-b0bd-92a7d3ab31ab","firstVolume":1,"secondVolume":10,"firstRate":3,"secondRate":2.9,"thirdRate":2.8}
Average time: 7.607243008613587
After run: The script uses approximately 62.77 MB
New:
Trace:
bash-3.2$ node test.js
Before loading: The script uses approximately 55.42 MB
During run: The script uses approximately 89 MB
PARSED {"$class":"org.accordproject.volumediscount.VolumeDiscountContract","firstVolume":1,"secondVolume":10,"firstRate":3,"secondRate":2.9,"thirdRate":2.8,"contractId":"9934bc09-7bc0-4d20-959a-fecca8bcecd7"}
Average time: 11.38540657043457
After run: The script uses approximately 60.36 MB
After run: The script uses approximately 60.44 MB
During run: The script uses approximately 83.99 MB
PARSED {"$class":"org.accordproject.volumediscount.VolumeDiscountContract","firstVolume":1,"secondVolume":10,"firstRate":3,"secondRate":2.9,"thirdRate":2.8,"contractId":"caa73b83-b52a-42d5-8809-79e5a93822a2"}
Average time: 10.471894514560699
After run: The script uses approximately 59.5 MB
Test script: test.txt
@mttrbrts Now that Cicero 0.21
is out, it would be worth running those benchmarks again. I hope the new parser will result in some improvements to the memory footprint.
Some preliminary tests (outside Cicero, on the parser itself) can be found at https://github.com/accordproject/cicero/issues/547#issuecomment-618684746
Trace:
➜ cicero-perf node test.js
Before loading: The script uses approximately 40.1 MB
During run: The script uses approximately 79.3 MB
PARSED {"$class":"org.accordproject.volumediscount.VolumeDiscountContract","firstVolume":1,"secondVolume":10,"firstRate":3,"secondRate":2.9,"thirdRate":2.8,"contractId":"b7387760-732e-4dbd-b2f9-14589a48838b"}
Average time: 8.047086201310158
After run: The script uses approximately 55.38 MB
After run: The script uses approximately 55.46 MB
During run: The script uses approximately 84.88 MB
PARSED {"$class":"org.accordproject.volumediscount.VolumeDiscountContract","firstVolume":1,"secondVolume":10,"firstRate":3,"secondRate":2.9,"thirdRate":2.8,"contractId":"a4b344d7-f535-4cea-a22a-d65342dad328"}
Average time: 7.19361614972353
After run: The script uses approximately 55.79 MB
Performance averaged across 100 runs remains constant between Cicero v0.20 and v0.21. Both demonstrate good throughput with sub 10ms parsing time, excluding file loading.
I'm reluctant to make a generalisation on memory utilization without further analysis. Anecdotally, it appears that parsing a template requires an additional ~30MB for this template in 0.21, compared to the ~50MB with 0.20
It seems like the memory footprint for loaded templates is massive. This needs investigation. Hopefully there should be something relatively obvious we can do to fix that.
To test:
cicero archive --target cicero
./cicero-template-library/src
(after renaming it to./test.js
)test.txt
Here is a trace: