HtmlUnit / htmlunit-neko

HtmlUnit adaptation of NekoHtml
Apache License 2.0
17 stars 13 forks source link

Review Neko to increase performance and reduce memory usage #60

Open rschwietzke opened 9 months ago

rschwietzke commented 9 months ago

This is just a bookmark for my ongoing task of tuning without larger rewrites. So far, we got to this:

Wikipedia DE Homepage, DOM Parser, JDK 17, JDK 8 target

Old, v3.8.0

Benchmark                                               Mode  Cnt      Score     Error   Units
HtmlParser_v380_Benchmark.domParser                     avgt    3  1,596,511 ± 136,413   ns/op
HtmlParser_v380_Benchmark.domParser:gc.alloc.rate.norm  avgt    3  1,091,867 ±   1,868    B/op

New, JDK 8 target

Benchmark                                               Mode  Cnt      Score     Error   Units
HtmlParser_v380_Benchmark.domParser                     avgt    3  1,178,706 ± 130,720   ns/op
HtmlParser_v380_Benchmark.domParser:gc.alloc.rate.norm  avgt    3    870,992 ±       0    B/op

New, JDK 11 target

Benchmark                                               Mode  Cnt      Score     Error   Units
HtmlParser_v380_Benchmark.domParser                     avgt    3  1,165,087 ± 112,361   ns/op
HtmlParser_v380_Benchmark.domParser:gc.alloc.rate.norm  avgt    3    870,656 ±       0    B/op

Summary: 25% faster and 20% less memory is needed. There are 1-2% more performance in a JDK 11 compile than a JDK 8 one due to improvements of the JDK 11 code generation (no accessor methods for inner classes anymore).

rschwietzke commented 8 months ago

Quick update. Lastest master rebase. Test cases are green for Neko.

Wikipedia DE Homepage, DOM Parser, JDK 17, JDK 8 target

Neko 3.9.0

Benchmark                                                  Mode  Cnt        Score        Error   Units
HtmlParser_v380_Benchmark.domParser                        avgt    3  1496302.471 ±  83074.746   ns/op
HtmlParser_v380_Benchmark.domParser:gc.alloc.rate.norm     avgt    3  1089728.077 ±      0.004    B/op

HtmlParser_v380_Benchmark.saxParser                        avgt    3  1279731.554 ± 293476.329   ns/op
HtmlParser_v380_Benchmark.saxParser:gc.alloc.rate.norm     avgt    3   739072.066 ±      0.015    B/op

HtmlParser_v380_Benchmark.simpleParser                     avgt    3  1214197.735 ±  41572.272   ns/op
HtmlParser_v380_Benchmark.simpleParser:gc.alloc.rate.norm  avgt    3   684296.062 ±      0.002    B/op

Neko Tuning Branch

Benchmark                                                  Mode  Cnt        Score        Error   Units
HtmlParser_v380_Benchmark.domParser                        avgt    3  1243972.026 ± 206167.723   ns/op
HtmlParser_v380_Benchmark.domParser:gc.alloc.rate.norm     avgt    3   877664.065 ±      0.034    B/op

HtmlParser_v380_Benchmark.saxParser                        avgt    3   948056.111 ±  65846.940   ns/op
HtmlParser_v380_Benchmark.saxParser:gc.alloc.rate.norm     avgt    3   597336.049 ±      0.003    B/op

HtmlParser_v380_Benchmark.simpleParser                     avgt    3   998957.186 ±  35800.702   ns/op
HtmlParser_v380_Benchmark.simpleParser:gc.alloc.rate.norm  avgt    3   597168.051 ±      0.002    B/op
rschwietzke commented 8 months ago

Final results as part of PR #73. Benchmark used: HtmlParser_v380_Benchmark. Benchmark suite will form its own repo soon.

Runtimes always improve, up to 30%, memory churn improves up to 15%. For really small HTML, we are using more memory, but still improving runtimes.

Data

Benchmark (file) v3.9.0 Tuned Units Diff
domParser simple.html 54,344 46,837 ns/op 86 %
domParser:gc.alloc.rate.norm simple.html 66,216 80,536 B/op 122 %
domParser small-xc-homepage.html 370,874 291,392 ns/op 79 %
domParser:gc.alloc.rate.norm small-xc-homepage.html 314,224 286,032 B/op 91 %
domParser wikipedia-de-hp.html 1,537,761 1,168,103 ns/op 76 %
domParser:gc.alloc.rate.norm wikipedia-de-hp.html 1,089,728 877,328 B/op 81 %
domParser puma-de-hp.html 9,748,265 6,782,239 ns/op 70 %
domParser:gc.alloc.rate.norm puma-de-hp.html 5,777,001 4,911,984 B/op 85 %
saxParser simple.html 52,867 43,900 ns/op 83 %
saxParser:gc.alloc.rate.norm simple.html 73,024 88,320 B/op 121 %
saxParser small-xc-homepage.html 310,194 232,592 ns/op 75 %
saxParser:gc.alloc.rate.norm small-xc-homepage.html 241,624 215,040 B/op 89 %
saxParser wikipedia-de-hp.html 1,251,809 1,000,298 ns/op 80 %
saxParser:gc.alloc.rate.norm wikipedia-de-hp.html 739,096 596,624 B/op 81 %
saxParser puma-de-hp.html 7,829,730 5,712,766 ns/op 73 %
saxParser:gc.alloc.rate.norm puma-de-hp.html 3,501,952 2,894,192 B/op 83 %
simpleParser simple.html 49,901 43,841 ns/op 88 %
simpleParser:gc.alloc.rate.norm simple.html 72,896 88,200 B/op 121 %
simpleParser small-xc-homepage.html 293,126 261,387 ns/op 89 %
simpleParser:gc.alloc.rate.norm small-xc-homepage.html 210,432 214,928 B/op 102 %
simpleParser wikipedia-de-hp.html 1,233,736 993,234 ns/op 81 %
simpleParser:gc.alloc.rate.norm wikipedia-de-hp.html 685,184 597,120 B/op 87 %
simpleParser puma-de-hp.html 7,422,030 5,894,724 ns/op 79 %
simpleParser:gc.alloc.rate.norm puma-de-hp.html 3,201,896 2,894,186 B/op 90 %