chsasank / llama.lisp

Lisp dialect designed for HPC and AI
GNU Lesser General Public License v2.1
14 stars 6 forks source link

Support for multi-dimensional arrays #16

Closed GlowingScrewdriver closed 2 months ago

GlowingScrewdriver commented 4 months ago

Currently, brilisp supports multi-dimensional pointers; however, support for contiguous multi-dimensional arrays is absent. LLVM and C both support multi-dimensional arrays.

In order to make use of the facilty in LLVM, array bounds need to be captured from the BRIL program. Additionally, LLVM allows these bounds to be supplied with the getelementpointer instruction along with a list of indices -- presumably to specify multiple levels of de-referencing in one instruction.

However, BRIL currently has no way of specifying an array with multiple dimensions. BRIL's alloc instruction takes only a single variable as an argument, which specifies the size of the allocated memory. Perhaps this is because BRIL's allocation is dynamic, and thus array sizes are not known at compile-time. Thus we will have to extend the alloc instruction to capture more than one array dimension.

GlowingScrewdriver commented 4 months ago

An example of a multi-dimensional array in C:

void fn (int n) {
    char arr [3][4];
    char a = arr[2][3];
    putchar(a);
}

The LLVM generated by Clang:

define dso_local void @fn(i32 noundef %0) #0 {
  %2 = alloca i32, align 4
  %3 = alloca [3 x [4 x i8]], align 1
  %4 = alloca i8, align 1
  store i32 %0, ptr %2, align 4
  %5 = getelementptr inbounds [3 x [4 x i8]], ptr %3, i64 0, i64 2
  %6 = getelementptr inbounds [4 x i8], ptr %5, i64 0, i64 3
  %7 = load i8, ptr %6, align 1
  store i8 %7, ptr %4, align 1
  %8 = load i8, ptr %4, align 1
  %9 = sext i8 %8 to i32
  %10 = call i32 (i32, ...) @putchar(i32 noundef %9)
  ret void
}

Note the usage of the inbounds keyword with getelementptr, and the format of the size argument with alloca

GlowingScrewdriver commented 4 months ago

Looks like multi-dimensional arrays with fixed and variable sizes must be handled differently.

For fixed sizes, the array dimensions are encoded in the array type itself:

char arr [2][3];
/*...*/
char a = arr [1][2];

in C becomes

%5 = alloca [2 x [3 x i8]], align 16
;...;
%7 = getelementptr inbounds [6 x [3 x i8]], ptr %5, i64 0, i64 1
%8 = getelementptr inbounds [3 x i8], ptr %7, i64 0, i64 2
%9 = load i8, ptr %8, align 1

in LLVM.

However, with variable sizes, the frontend seems to handle the indexing, while LLVM only sees a 1D array:

char arr[m][n];
char a = arr[1][2];

becomes

%5 = alloca ptr, align 8
%14 = mul nuw i64 %10, %12  ; %10 and %12 are n and m
%15 = alloca i8, i64 %14, align 16
;...;
%16 = mul nsw i64 1, %12
%17 = getelementptr inbounds i8, ptr %15, i64 %16
%18 = getelementptr inbounds i8, ptr %17, i64 2
%19 = load i8, ptr %18, align 1
GlowingScrewdriver commented 4 months ago

What this boils down to is this: do we want to have extra syntax that makes it natural to use multi-dimensional arrays?

Brilisp already has enough functionality for the program to define the logic for multi-dimensional array access. Question is, do we want to add syntax to abstract away that?

If so, we are looking at implementing strided arrays. We cannot offload this to LLVM in the case of variable-sized multi-dimensional arrays. This talk has an overview on what striding is.

chsasank commented 4 months ago

We need to judge the advantages of supporting something like this in BRIL.

%5 = alloca [2 x [3 x i8]], align 16

instead of just doing alloca 6 i8. Is it the simpler syntax with ptradd? Let's refer to LLVM documentation of alloca and understand why LLVM has this feature in the first place. Surprisingly this whole thing is not mentioned in docs.

GlowingScrewdriver commented 4 months ago

The square brackets notation of the array size is actually the ArrayType syntax: https://llvm.org/docs/LangRef.html#array-type

chsasank commented 4 months ago

So this settles it. We're not dealing with special syntax of alloca here. We're dealing with special types! From LLVM docs:

Aggregate Types are a subset of derived types that can contain multiple member types. Arrays and structs are aggregate types.

So what alloca [2 x [3 x i8]] is doing is creating element of size 1 and of aggregate type 'array' [2 x [3 x i8]]. This nested arrays is a type that we don't support yet. In fact this issue is related to supporting structs and other types as in #13. Let's keep the issue open and implement array types along with other types later.

GlowingScrewdriver commented 4 months ago

Until we have support for an array type, this goes beyond the scope of "Modelling LLVM".

chsasank commented 4 months ago

This is type problem not an alloc problem. We'll revisit later.

chsasank commented 2 months ago

We decided not to implement array types as of now. Multi dimensional arrays will be implemented by strides etc