Target-specific, User-defined Libraries

exo-lang / exo

Exocompilation for productive programming of hardware accelerators

MIT License

292 stars 28 forks source link

	K1	K2	K3
T1
T2
T3

A couple of specific examples of these from conversations with @jrk and @gilbo:

Register Allocation

Register allocation is one of those things that you don't think about too much unless doing low-level perf engineering because it is completely automated in most compilers and AFAIK, not influenceable from the source-level program.

This kind of automation, from Exo-perspective, is a target-specific abstraction: for each backend you can imagine writing a register allocation pass that automatically attempts to assign the right registers for all the buffers in your program. Next, you can imagine extending the memory API so that your allocator tells you which registers had to be spilled. For example, this original program:

A: f32[10] 
B: f32[16]
...

gets transformed into:

A: f32[10] @ AVX2 & Spilled
B: f32[16] @ AVX2
...

However, A is used in a perf critical section while B is used to move data in and out. We can use .set_memory to change the allocation of our program.

Of course, with this automation and low-level rearrangement, we might even want an analysis that ensures that our given allocation is valid for the target!

Register Allocation Analysis

Analysis and abstractions go hand in hand: if we want users to be able to benefit from high-level scheduling operations while still having low-level control over things, we should provide a way to define new analyses. For example, register- (or memory-) allocation is such a common task, you can imagine providing a way to build a new memory-allocation analysis:

mem_alloc = exo.analysis_builder.MemoryAllocation(
  target = "intel-amx",
  registers = {
    "single-precision": 32,        # Completely made up
    "double-precision": 16,
    "exclusive": False
}

These analyses could then be used in conjunction of high-level scheduling operators to allow fine-grained control with guarantees.

Vectorization

This is a very general instance of a domain abstraction. However, vectorizers often have to make crucial, target-specific decisions like whether or not to use predication or explicit masks (or operate over an abstract IR).

The Exo approach here could be building a dead-simple, and predictable vectorizer that gives up when it see complex branching code and requires the programmer to pick the right strategy to handle them for the particular backend they want to target.

Of course, using this dead simple vectorizer, you can imagine implementing a more sophisticated vectorizer that takes a list of scheduling operators to try to apply when stuck. This is the power of composition: a tower of abstractions that are all rooted in simple, predictable operators.

exo-lang / exo