Open AerialMantis opened 4 years ago
Do we plan on being able to look up cache sizes at different levels of the memory hierarchy? We could use Strassen for a simple example, and have it use the lowest-level cache size to decide when to stop recursing.
I would like to have a property which reflects the various caches levels and their sizes. I'm not entirely sure how best to represent those in a generic way yet, perhaps through a hierarchy of managed memory resources which provide constructive/destructive interference.
Yeah, I like that idea, it would be a good example of using the topology information. So we would recursively divide the matrices into blocks until they fit into the lowest level cache and then compute one at a time, per group of threads sharing the cache.
It would be interesting to then further generalize this so that larger matrices could be subdivided across NUMA regions as well.
I am working on a pseudo generic algorithm for this incorporating the various architecture agnostic information that we will need to be able to query, and this is a summary of what I have so far:
From last heterogeneous C++ call:
In our last discussion, we decided that we should create a motivational example of how a developer could use the topology discovery design proposed in P1795 to optimise an algorithm such as matrix multiply based on different system architectures.