Closed xintin closed 5 months ago
Well, I see your point. On the other hand, this is a more high-level asymptotic analysis. The data movement is intended to be the amount of data accessed from main (slowest) memory. We are looking at an "ideal" system. Actual data movement really depends on the problem sizes and hardware details, such as cache sizes. I'm not aware of any scientific work that provides a rigorous bound on data movement.
I agree with your points. But what I was referring to is something similar to this slide, slides 40-41. And your analysis is similar to slide 39 in the mentioned deck.
I think the figures used in James Demmel's slides are more intuitive.
I'm not aware of any scientific work that provides a rigorous bound on data movement.
If interested, there is excellent work by Hong/Kung, Irony et al., and Ballard on this topic. (obviously, out of scope here).
Lastly, thanks again for this wonderful compilation. This repo is so resourceful.
Hi @xintin, thanks again for your suggestions and the references. I'm going to update the related section. Please take a look and let me know if it is clearer now.
Looks great. Thank you @federico-busato
In slide 29/62, it states that the data movement in a naive mm is
(N^2 . 4B) . 3
I think it should be
(N^3 . 4B) . 3
Explanation:
N^2
elements, each accessed N times, soN^3
.N^2
elements; each element is written N times (accumulated value), soN^3
.Assuming each floating point is
B
bytes (usually,B=4
for single precision orB=8
for double precision).Total data movement in bytes is approximately =
3 . N^3 . B