Koka performance - Githubissues

anfelor commented 4 months ago

Hi Andras, thanks for figuring this out with me!

In our reddit chat, I wrote that "maybe this boils down to how well optimized GHC is for ARM vs x86", since the numbers I posted:

Koka (with borrowing changes): 994475 us
GHC -N8 -A16M -qb0 -s: 944395 us

diverged from the numbers you posted:

Koka: 947518 us, rss 644mb
GHC -A16M: 573145 us, rss 1738mb

As you can see, Koka is 5% faster on your machine, while GHC is 60% faster. However, your processor is 30% faster on CineBench and much more on PassMark. So really, I had it exactly wrong: GHC probably performs right, but Koka must be much slower on your machine than I would have expected coming from my numbers. I don't think this is due Koka gaining more from the instruction set than CineBench, since Daan has an AMD 5950X, which is about 6% faster on CineBench and in our benchmarks his machine executed Koka just a little bit faster than mine in Appendix B.

I wonder how we could figure out where this discrepancy comes from. Are you planning to go to PLDI by any chance? If so, we could meet during a break and run some tests together.

AndrasKovacs commented 4 months ago

Hi! I won't be at PLDI. I expect to go to TYPES and ICFP this year. I just ran TreeNF-23 on a laptop as well, on Intel 1345U (2 P-cores, 8 E-cores). According to your comparison page, this is quite close to M1. There was more variance here than on my desktop, so I took the best times that are reliably reproducible after rerunning a bunch of times.

koka: 1063 ms, rss 644M
ghc -N8 -A16M -qb0: 782 ms, rss 1740M
ghc -N8 -A32M -qb0: 682 ms, rss 1995M

Note: this is with the latest commit here, including the borrow changes and no elaboration in timings, but with 10 iterations and size 23 for both benchmarks.

I'm rather busy currently, I'll have more time to fiddle with benchmarks here in 2-3 weeks. I'd definitely like to have more and possibly better benchmarks. Feel free to push whatever code you like in the meantime.

AndrasKovacs commented 4 months ago

I reran the desktop 7700X tests to be sure:

koka: 940ms, rss 644M
ghc -N8 -A16M -qb0: 594ms, rss 1738M
ghc -N8 -A32M -qb0: 556ms, rss 1993M

The -A32M is significantly worse than previously, I wonder if I just took wrong options or numbers last time. Anyway, I tried more things and it seems I can make it go a bit faster still:

ghc -N16 -A64M -qb0: 447ms, rss 1907M

AndrasKovacs / gc-benchmarks

Koka performance #2