codeplaysoftware / standards-proposals

Repository for publicly sharing proposals in various standards groups
Apache License 2.0
27 stars 17 forks source link

CP013: Create a motivational example (P1795) #129

Open AerialMantis opened 4 years ago

AerialMantis commented 4 years ago

In our last discussion, we decided that we should create a motivational example of how a developer could use the topology discovery design proposed in P1795 to optimise an algorithm such as matrix multiply based on different system architectures.

mhoemmen commented 4 years ago

Do we plan on being able to look up cache sizes at different levels of the memory hierarchy? We could use Strassen for a simple example, and have it use the lowest-level cache size to decide when to stop recursing.

AerialMantis commented 4 years ago

I would like to have a property which reflects the various caches levels and their sizes. I'm not entirely sure how best to represent those in a generic way yet, perhaps through a hierarchy of managed memory resources which provide constructive/destructive interference.

Yeah, I like that idea, it would be a good example of using the topology information. So we would recursively divide the matrices into blocks until they fit into the lowest level cache and then compute one at a time, per group of threads sharing the cache.

It would be interesting to then further generalize this so that larger matrices could be subdivided across NUMA regions as well.

AerialMantis commented 4 years ago

I am working on a pseudo generic algorithm for this incorporating the various architecture agnostic information that we will need to be able to query, and this is a summary of what I have so far:

AerialMantis commented 4 years ago

From last heterogeneous C++ call: