Performance - Githubissues

danwallach commented 2 years ago

I'm curious if you've looked at performance tuning. I built a relatively simple benchmark that just does ElGamal encryptions and decryptions, over and over, so lots of modular exponentiations, and I've found that, on my ~2013 MacPro:

GnuMP: ~200 ops/sec Java's BigInteger: ~100 ops/sec kt-math: ~33 ops/sec

That factor of three kinda hurts. I haven't (yet) gotten lost in detailed profiling of the code. The Java BigInteger code is doing the same things as kt-math, so it's really unclear what's going on.

gciatto commented 2 years ago

I definitely didn't, and thank you for that BTW.

The project started as an experiment to test the Kotlin Multiplatform feature, and ended up being the only way to support big numbers in multiplatform projects.

I believe there's a "simple" workaround here, which could also be a wise design choice: I could make BigInteger and BigDecimal interfaces. Then I could use Java's official Big* classes to implement those interfaces on JVM, and only use the Kotlin code on other platforms.

That should reduce the performance loss on the JVM.

gciatto commented 2 years ago

@danwallach can you provide the code of the benchmark you exploited to perform your performance assessment?

I'm in the process of making BigInteger and BigDecimal interfaces, so that on the JVM the "original" implementations can be used behind the scenes.

I'd like to assess whether this change provokes an improvement in performance or not, before releasing it.

It may also be useful for profiling.

danwallach commented 2 years ago

It was buried in a larger codebase. I'll see if I can extract something useful for you.

Meanwhile, off topic, but I'm still trying to figure out the subtleties of multiplatform Kotlin, and it would be handy if you generated "IR" as well as "LEGACY" libraries.

danwallach commented 2 years ago

Okay, this was a bunch of fun to put together. This repo carves out the benchmark aspect of a much larger thing I'm working on. This particular repo is also JVM-only, although there's no reason that the kt-math part wouldn't work elsewhere. https://github.com/danwallach/electionguard-kotlin-benchmark

./gradlew run should be all you need to launch the benchmark.

danwallach commented 2 years ago

FWIW: What I'm really working on is trying to build a common interface so that I can get java.math.BigInteger on the JVM or Android, Microsoft HACL* on Kotlin-Native, and I'm-not-sure-yet on JavaScript.

HACL* is C code that was generated by a formally verified toolchain. It only supports 256-bit or 4096-bit bignum arithmetic, but that's exactly the only part I care about.

For JavaScript, I was initially using kt-math, but I'm currently playing around with gmp-wasm, which is nightmarish to build all the Kotlin interfaces for, despite being written in TypeScript. But if I can get it to work, it's going to be really fast. Hopefully. Maybe.

danwallach commented 2 years ago

Also FWIW, here are the results of running the benchmark on my 2013 MacPro (6-core Xeon, 3.5GHz):

Initializing benchmark for NO_ACCELERATION (kt-math)
  = 31.19833 ops/sec
Initializing benchmark for NO_ACCELERATION (java.math.BigInteger)
  = 94.75081 ops/sec
Initializing benchmark for LOW_MEMORY_USE (kt-math)
  = 68.38542 ops/sec
Initializing benchmark for LOW_MEMORY_USE (java.math.BigInteger)
  = 155.78751 ops/sec
Initializing benchmark for HIGH_MEMORY_USE (kt-math)
  = 74.28869 ops/sec
Initializing benchmark for HIGH_MEMORY_USE (java.math.BigInteger)
  = 174.97813 ops/sec
Initializing benchmark for EXTREME_MEMORY_USE (kt-math)
  = 75.14277 ops/sec
Initializing benchmark for EXTREME_MEMORY_USE (java.math.BigInteger)
  = 183.28446 ops/sec

The punchline seems to be, regardless of the acceleration tricks that I use, which replace modular exponentiation with multiplies out of a huge pre-computed table, we're talking about a 2-3x slowdown relative to java.math.BigInteger.

danwallach commented 2 years ago

In other news: my same benchmark, running with Kotlin/Native using Microsoft's HACL*, seems to run at 1/10 the performance of java.math.BigInteger. This is code that was advertised as running at the full native speed of something like GnuMP. Not even in the ballpark! I'm still trying to investigate why this is. I appear to be using all the right optimizations, but I haven't been a serious C hacker for many years.

I haven't benchmarked kt-math on Kotlin/Native, as opposed to the JVM, but now I'm thinking it might be preferable to HACL*, if I can't fix it.

Food for thought: how would you make kt-math run faster? For starters, you'd really want to do things with ULong rather than Int, since every modern machine is 64 bits. Still, you end up missing out on any way to say "give me the upper 64 bits of the 128-bit result from a 64x64 multiply", which would be really important.

gciatto commented 2 years ago

Okay, this was a bunch of fun to put together. This repo carves out the benchmark aspect of a much larger thing I'm working on. This particular repo is also JVM-only, although there's no reason that the kt-math part wouldn't work elsewhere. https://github.com/danwallach/electionguard-kotlin-benchmark

./gradlew run should be all you need to launch the benchmark.

Thanks for sharing. I will use that for benchmarking.

Also FWIW, here are the results of running the benchmark on my 2013 MacPro (6-core Xeon, 3.5GHz)

Have you considered using a profiler to understand which methods kt-math spends the most time in?

Food for thought: how would you make kt-math run faster? I prefer not to guess with optimizations. I'll use a profiler on your benchmark, so that I can identify which pieces of kt-math waste more time.

Personally, I expect that Kotlin wastes a lot more time than Java because of null checks. So, I will probably have to add a lot of nullable types here and there, in any case.

gciatto commented 2 years ago

However, I believe that attempting to optimize a generated code piece that was tailored on Java, may not be the best strategy. The purpose of kt-math was to provide a multi-platform API for BigIntegers and BigDecimals. Porting the whole implementation was just a "quick and dirty" strategy to let Java and JS classes expose the exact same behaviour.

However, in principle, there is no reason not to rely on the original, well-performing, JVM classes behind the scenes. That's in practice what I'm trying to do within the feature/interfaces branch. I hope that will improve performances, at least on the JVM.

Of course, optimizations on other platforms will still be required. But keep in mind that, if a better performing library for BigIntegers and BigDecimals is available on platform X, I may consider to exploit that behind the scenes of kt-math instead of relying on my translated code, which is slow.

danwallach commented 2 years ago

Agreed. The low-hanging fruit here is using java.math.BigInteger on JVM and Android, especially since the latter bottoms out in their SSL native code.

I've looked at a few JavaScript libraries, but you inevitably run into async / promise APIs as in gmp-wasm. You'd have to make a high level decision to use suspend functions as part of the high level interface on every platform.

For native code, I think the most performant library is going to be GMP.

Meanwhile, you seem to have a difference between your library and java.math.BigInteger. To get the behavior of their "mod", I'm using your "rem".

danwallach commented 2 years ago

So I managed to get kt-math running on Node (via Kotlin/JS), and the same ElGamal encryptions that used to run at 30 encryptions/sec now take two seconds per encryption. So, factor of 60 slowdown. That's.... almost certainly not your fault..... but it's impressively bad.

danwallach commented 2 years ago

So I spent a few more days beating on trying to get gmp-wasm to work, and ultimately gave up. Things got weird and crashy. That's too bad, because it seems like it's the right way to go. If you want to tackle this at some future date, let me know and I'll try to get you at least to the point where I got when I gave up.

danwallach commented 2 years ago

A few new details that you may find relevant:

Turns out, the modular exponentiation from the bigint-mod-arith package (https://github.com/juanelas/bigint-mod-arith), which builds on JavaScript's native bigint type, is actually quite fast. Not as fast as something like GMP running natively, but I'm still getting north of 1000 modular exponentiations with a 4000-bit modulus, on my 2013-era MacPro. Not bad.

So far as I can tell, the bigint type, itself, is not yet exposed in Kotlin/JS. This suggests that you could use native JS code to build a wrapper class around the JS bigint class, and then use that wrapper from Kotlin/JS code to implement the rest of the functionality. Maybe you build on this specific bigint-mod-arith package, or you end up forking it to build the infrastructure you need, but it's worth pondering.

gciatto / kt-math

Performance #8