OrderN / CONQUEST-release

Full public release of large scale and linear scaling DFT code CONQUEST
http://www.order-n.org/
MIT License
94 stars 24 forks source link

276 use blas #323

Closed connoraird closed 2 months ago

connoraird commented 2 months ago

Description

Speedup plot

This plot shows the performance of test test_004_isol_C2H4_4proc_PBE0CRI for 1 mpi process 276-use-blas

connoraird commented 2 months ago

The changes in this PR look pretty straightforward to me. It would be interesting to time the loop do nsf3 = 1, ia%nsup with both versions of the code and see how much faster the blas call is. I guess you could do this with one thread so you don't have to worry about the thread-safety of the timers. With the graphs you've shown in the other PRs, I'm not entirely convinced this is making the code significantly faster.

Results of timing nsf3 loop threads 276-combine-nsf-loops 276-use-blas
1 2.47956 s 1.95329 s
2 1.45973 s 1.24583 s
4 0.77407 s 0.67395 s
8 0.46875 s 0.39146 s
16 0.27807 s 0.22739 s

276-combine-nsf-loops and 276-use-blas