OrderN / CONQUEST-release

Full public release of large scale and linear scaling DFT code CONQUEST
http://www.order-n.org/
MIT License
96 stars 25 forks source link

Test longer matrix ranges in matrix multiply #269

Closed tkoskela closed 7 months ago

tkoskela commented 7 months ago

Change DM.L_range to 20 or more.

Testing this with both develop and #266 using matrix_multiply input

tkoskela commented 7 months ago

I ran a comparison in Vtune. In my test, develop is outpefroming #266 in total run time. However there are some interesting differences.

266 has less spin time and roughly 25% less time spent in __kmp_fork_barrier. However it has 50% more time spent in __kmpc_barrier. My initial interpretation is that the serialisation of communication and computation we've forced is causing a lot of time to be wasted at the barriers we've put in. #265 seems like the obvious direction to look at next.

Develop

image

266

image