OrderN / CONQUEST-release

Full public release of large scale and linear scaling DFT code CONQUEST
http://www.order-n.org/
MIT License
96 stars 25 forks source link

Investigate performance of other multiply kernels #268

Closed tkoskela closed 7 months ago

tkoskela commented 8 months ago

Run times on young. 8 MPI ranks / 4 OpenMP threads per rank. Using inputs from matrix_multiply in https://github.com/OrderN/CONQUEST-release/pull/262 These are just single runs at this point so might contain some variation.

Best case is ompDoik which gives 2x speedup with 4 threads.

ompTsk segfaulted and didn't produce a run time. Based on the comments in the previous eCSE report it did not seem worth debugging further at this point.

$ tail -n 1 */Conquest_out
==> default/Conquest_out <==
    Total run time was:             140.508 seconds

==> gemm/Conquest_out <==
    Total run time was:             107.899 seconds

==> ompDoii/Conquest_out <==
    Total run time was:              82.969 seconds

==> ompDoik/Conquest_out <==
    Total run time was:              69.586 seconds

==> ompDoji/Conquest_out <==
    Total run time was:              85.187 seconds

==> ompDojk/Conquest_out <==
    Total run time was:              72.171 seconds

==> ompGemm/Conquest_out <==
    Total run time was:              74.297 seconds

==> ompTsk/Conquest_out <==
tkoskela commented 7 months ago

UPDATE Because of a bug in my script, I had neglected the ompGemm_m kernel which is optimized to allocate temporary arrays before the main loop, instead of deallocating and reallocating them each loop iteration. It has performance comparable to the ompDoik kernel.

==> default/Conquest_out <==
    Total run time was:             140.508 seconds

==> gemm/Conquest_out <==
    Total run time was:             107.899 seconds

==> ompDoii/Conquest_out <==
    Total run time was:              82.969 seconds

==> ompDoik/Conquest_out <==
    Total run time was:              69.586 seconds

==> ompDoji/Conquest_out <==
    Total run time was:              85.187 seconds

==> ompDojk/Conquest_out <==
    Total run time was:              72.171 seconds

==> ompGemm/Conquest_out <==
    Total run time was:              74.297 seconds

==> ompGemm_m/Conquest_out <==
    Total run time was:              69.116 seconds

==> ompTsk/Conquest_out <==