AndrasKovacs / gc-benchmarks

garbage collection benchmarks
MIT License
2 stars 1 forks source link

Koka performance #2

Open anfelor opened 4 months ago

anfelor commented 4 months ago

Hi Andras, thanks for figuring this out with me!

In our reddit chat, I wrote that "maybe this boils down to how well optimized GHC is for ARM vs x86", since the numbers I posted:

diverged from the numbers you posted:

As you can see, Koka is 5% faster on your machine, while GHC is 60% faster. However, your processor is 30% faster on CineBench and much more on PassMark. So really, I had it exactly wrong: GHC probably performs right, but Koka must be much slower on your machine than I would have expected coming from my numbers. I don't think this is due Koka gaining more from the instruction set than CineBench, since Daan has an AMD 5950X, which is about 6% faster on CineBench and in our benchmarks his machine executed Koka just a little bit faster than mine in Appendix B.

I wonder how we could figure out where this discrepancy comes from. Are you planning to go to PLDI by any chance? If so, we could meet during a break and run some tests together.

AndrasKovacs commented 4 months ago

Hi! I won't be at PLDI. I expect to go to TYPES and ICFP this year. I just ran TreeNF-23 on a laptop as well, on Intel 1345U (2 P-cores, 8 E-cores). According to your comparison page, this is quite close to M1. There was more variance here than on my desktop, so I took the best times that are reliably reproducible after rerunning a bunch of times.

Note: this is with the latest commit here, including the borrow changes and no elaboration in timings, but with 10 iterations and size 23 for both benchmarks.

I'm rather busy currently, I'll have more time to fiddle with benchmarks here in 2-3 weeks. I'd definitely like to have more and possibly better benchmarks. Feel free to push whatever code you like in the meantime.

AndrasKovacs commented 4 months ago

I reran the desktop 7700X tests to be sure:

The -A32M is significantly worse than previously, I wonder if I just took wrong options or numbers last time. Anyway, I tried more things and it seems I can make it go a bit faster still: