Matrix dialect - Githubissues

NeuralCoder3 commented 1 year ago

A simple matrix dialect. It exposes a matrix type Mat: Π [n: .Nat, S: «n; .Nat», T: *] -> * such that Mat (n, s, T) is an n-dimensional tensor with size s0 ... s{n-1}.

The matrix operations are:

shape -- returns the size along the ith dimension
constMat -- returns a new matrix
read -- reads an entry of a matrix
insert -- replaces an entry of a matrix
init -- creates a matrix but does not initializes entries
prod -- computes the matrix-matrix product of two two-dimensional matrices (over a floating point type)
transpose -- transposes a two-dimensional matrix
sum -- sums up all entries of a matrix
mapReduce -- performs an arbitrary mapping and reduction operation (see below)

All operations are inside the memory monad, allowing for a matrix implementation involving side effects. In fact, the current matrices are nested pointers to arrays that are manipulated in-place.

An alternative might be a immutable array implementation like skew binary random access list or one array implementation from haskell (e.g. diff arrays)

MapReduce

mapReduce is inspired by the einstein sum notation and implementations like

Tensorflow / XLA: einsum
Pytorch: einsum
NumPy: einsum
Halide
Haskell: Tensor DSL
Ricci Calculus
Einstein Notation
Pytorch DSL

It takes m matrices, a zero, and a combination function. The combination function takes the accumulated (initially zero) and elements from the input matrices and returns the new accumulator. The result is a matrix.

Pseudocode:

out_matrix = init
for output_indices:
  acc = zero
  for input_indices:
    element_[0..m] = read(matrix[0..m], indices)
    acc = f (acc, elements)
  insert (out_matrix, output_indices, acc)
return out_matrix

Optimization Pipeline

The matrix operations and type are translated using a staging approach that allows intercepting the process at different levels.

High-Level Rewrites

First, high-level operations like transpose, sum, and prod are rewritten into the mapReduce form. To do so, pre-defined functions of the form internal_mapRed_matrix_[name] are looked up. The functions should agree on the type with the corresponding axiom.

High-Level Externalization

Alternatively, certain operations like prod could be dispatched to external libraries like blas. This is however not implemented in the current version.

Medium-Level Lowering

The next step is to lower mapReduce to affine for loops. The conceptual idea corresponds to the pseudocode above.

Low-Level Lowering

The last step is to eliminate all remnants of the matrix dialect. We remove the remaining internal_mapRed_ functions (due to a missing association dialect).

Afterward, we lower the low-level matrix operations and types.

The matrix type is replaced by a pointer to n nested arrays.
init is replaced with alloc
read becomes lea+load
insert becomes lea+store
constMat becomes alloc+pack+store

Low-Level Functional Lowering

We could lower the matrix to a functional array representation like Haskell arrays or random access lists at this point.

Additional Operations

One could implement further operations either deeply or shallowly:

parallel versions of other operations
a specialized map
a fold (functional speak for reduce)
zipWith (a map on two matrices)

Known Issues

Edge cases like zero inputs or outputs are not handled correctly in every case for mapReduce.

NeuralCoder3 commented 1 year ago

@leissa I tried to merge the current master. Currently, the build process already fails at bootstrapping the matrix dialect thorin file:

matrix.thorin:326:18-326:40: error: cannot pass argument '__463975#2:(.Idx 3)' of type '«2; %math.F __453147»' to '%math.arith.mul (.infer (<nullptr>), .infer (<nullptr>)) 0' of domain '«2; %math.F (.infer (<nullptr>), .infer (<nullptr>))»'

Code:

.let R = %math.F (p,e);

.con prod_comb [[mem:%mem.M, acc:R, [a:R, b:R]], ret:.Cn[%mem.M,R]] = {
    .let v = %math.arith.mul 0 (a,b);

I already removed the implicit arguments.

NeuralCoder3 commented 1 year ago

Before merging the matrix dialect we should probably merge ho_codegen (#181), a branch on which this (and ad) builds up upon.

leissa commented 1 year ago

Again some weird things happening during alpha-equiv. Probably related to #174. I will investigate.

NeuralCoder3 commented 1 year ago

The current issue is in lower_matrix_mediumlevel.cpp : counting_for. Specifically, the computation of the accumulator type fails. This probably is caused by uninitialized components in the acc generated in line 241.

Fixed in fd73b1e

leissa commented 1 year ago

Side note: I need to adjust my email settings. Sometimes I only see that you tagged me as a reviewer days later ... Sorry, for that.

AnyDSL / MimIR

Matrix dialect #180

MapReduce

Optimization Pipeline

High-Level Rewrites

High-Level Externalization

Medium-Level Lowering

Low-Level Lowering

Low-Level Functional Lowering

Additional Operations

Known Issues