Closed connoraird closed 2 months ago
The changes in this PR look pretty straightforward to me. It would be interesting to time the loop
do nsf3 = 1, ia%nsup
with both versions of the code and see how much faster the blas call is. I guess you could do this with one thread so you don't have to worry about the thread-safety of the timers. With the graphs you've shown in the other PRs, I'm not entirely convinced this is making the code significantly faster.
Results of timing nsf3 loop | threads | 276-combine-nsf-loops | 276-use-blas |
---|---|---|---|
1 | 2.47956 s | 1.95329 s | |
2 | 1.45973 s | 1.24583 s | |
4 | 0.77407 s | 0.67395 s | |
8 | 0.46875 s | 0.39146 s | |
16 | 0.27807 s | 0.22739 s |
Description
Speedup plot
This plot shows the performance of test
test_004_isol_C2H4_4proc_PBE0CRI
for 1 mpi process