Add results for current panama-vector build

amCap1712 commented 3 years ago

I ran the benchmarks on the latest build of panama-vector repo available at builds.shipilev.net. Here are the results in csv as well, https://gist.github.com/amCap1712/92d1d34a724939a2daf45eb2e7bfc6a9

It seems that the performance has markedly improved since March. The one case which suffers the most is ASCII only one. I ran the perfasm profiler with JMH and see that no vector instructions were generated for it :(.

I also did some experimenting with using static finals instead. Its a WIP, it boosts perfomance significantly at the cost of not letting the user choose the Vector size. Will open a PR for it shortly.

lemire commented 3 years ago

+1

AugustNagro commented 3 years ago

Thanks, it's great to see they are making progress.

I merged your changes & will run the benchmarks on my desktop.

AugustNagro commented 3 years ago

Please checkout the updated readme! https://github.com/AugustNagro/utf8.java#performance

Performance is a lot better now. I also updated the lookup tables to be both fast and configurable (for example LookupTables256). There was no significant difference from using an interface delegating to static fields vs hard-coding static fields.

The JIT will inline dynamic dispatch for up to 2 implementations, so I made sure to split each benchmark into its own class to ensure only the relevant LookupTables class was classloaded on each JVM fork.

amCap1712 commented 2 years ago

Thanks @AugustNagro. I was also going to test a similar LookupTables approach next but I already like your version better :D. Cheers!

amCap1712 commented 2 years ago

@AugustNagro Regarding JIT inlining, as mentioned here it is per call site. (I consider the benchmark method as the call site not Utf8.validate because JMH runs with don't inline the benchmark body compiler hint. Also, each benchmark method is run its own fork so Utf8.validate would observe only one implementation anyways.). Since each benchmark will always observe the same implementation, all call sites in the benchmark body are monomorphic and JIT inlineable. I tested locally and didn't observe performance difference if all benchmarks are put in the same class. If you desire, I can open a PR to merge the benchmarks into one class.

AugustNagro commented 2 years ago

Thanks for checking; please do :)

On Fri, Sep 24, 2021, 1:45 AM Kartik Ohri @.***> wrote:

@AugustNagro https://github.com/AugustNagro Regarding JIT inlining, as mentioned here https://shipilev.net/blog/2015/black-magic-method-dispatch/#_monomorphic_cases it is per call site. Since each benchmark will always observe the same implementation, all call sites in the benchmark are monomorphic and JIT inlineable. I tested locally and didn't observe performance difference if all benchmarks are put in the same class. If you desire, I can open a PR to merge the benchmarks into one class.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AugustNagro/utf8.java/pull/3#issuecomment-926455931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQOKNWDA3Y432FMJFJLOMLUDQ3DLANCNFSM5D2BTOOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ohpauleez commented 2 years ago

I'm extremely impressed with the effort and results here - well done! Please keep kicking ass on the road to simdjson!

AugustNagro / utf8.java

Add results for current panama-vector build #3