calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
453 stars 45 forks source link

Relay: Support tensors #184

Closed sampsyo closed 3 years ago

sampsyo commented 3 years ago

From #182. A good next step would be to support a Relay program like this:

fn (%x: Tensor[(4, 4), float32], %y: Tensor[(4, 4), float32]) {
  add(%x, %y)
}

This will require generating and interacting with memories to hold the tensors, as we do in the Dahlia frontend.

rachitnigam commented 3 years ago

Minor nit: Probably don't want the type to be floats. Take a look at the mem primitives in the library.

ViviYe commented 3 years ago

I am trying to implement adding 2 tensors and I am kinda stuck on what to return for visiting a tensor variable. should I just return the name of the cell since there is no .out for a std mem?

rachitnigam commented 3 years ago

Good question. Short answer: probably.

Long answer: The problem here is differences in what is considered a value in FuTIL vs TVM. TVM says that tensors are "just" values and you can pass them anywhere you want. However, for FuTIL (and hardware in general), memories are pieces of circuitry that can't just be passed around. Worse still, unlike std_reg, returning the cell might be incorrect since you have to correctly synthesize other groups to use that memory.

The thing to start with is probably writing down an example here and then imagining what the generated FuTIL program should look like.

ViviYe commented 3 years ago

I tried compiling the dahlia program

decl A: ubit<32>[8];
decl B: ubit<32>[8];
decl v: ubit<32>[8];
for (let i: ubit<4> = 0..8) {
  v[i] := A[i] +  B[i];
}

If I understood correctly, when add is called on 2 tensors, we will need to

if we just return the name of the hardware, we won't have the sizes or dimensions of the arrays available at visit_call : ( we also will have to somehow mark it as a mem object instead of a scalar variable. I just couldn't come up with a nice and clean way to handle this 😢

rachitnigam commented 3 years ago

Right, that's precisely the problem. There is one other possible solution. If you call the generated Dahlia program with the futil compiler and give the compiler -p external, it rename all memory accesses into reads from the port of the component.

The point of the -p external thing is to tell the compiler that "this memory comes from the outer world. here are wires that let you interact with it". Maybe it'll be fruitful to think about memories in that way.

sampsyo commented 3 years ago

if we just return the name of the hardware, we won't have the sizes or dimensions of the arrays available at visit_call : ( we also will have to somehow mark it as a mem object instead of a scalar variable. I just couldn't come up with a nice and clean way to handle this 😢

A couple of ideas come to mind for this:

ViviYe commented 3 years ago

For using check_type, does that mean to check the expr type before even calling visit on the add arguments?

sampsyo commented 3 years ago

Hmm, I'm not sure the order matters… it seems like you could recursively call the emitter to generate code for both arguments, and then use their type to decide how to use them?

ViviYe commented 3 years ago

That makes sense! So the visiting call will be independent from the type checking :) depending on the returned tensor dimension from type checking the child expression I then determine whether to use a memory or register?

sampsyo commented 3 years ago

Yes, that sounds about right to me!

ViviYe commented 3 years ago

I have another question now: at the function level, I made a mem cell for ret based on the return type. Somehow I will have to connect the ret to the output of add(which we don't know will be the return value when we call visit_call). I assume we can always just copy the memory but are there any better way to deal with it 🤓 ?

sampsyo commented 3 years ago

That's a good question—to confirm, the issue is that you visit the add, and all that goes great, and it produces a memory. But only after doing that, your visitor then realizes that this add was the return value (i.e., the "top" of the expression tree). So how to hook up the previously-created result memory to the ret memory?

If that's the case, then here is one super-dumb strategy:

Does that make sense?

ViviYe commented 3 years ago

I am am not sure if I understood completely for the following program

fn(%x, %y, %z){
   let %v = add(%x ,%y)
   add(%v, %z)
}

The AST will be Let -> Call so add(%x ,%y) will be visited before add(%v, %z) and we would want to create a new memory for add(%x ,%y)?

ViviYe commented 3 years ago

except for the handling ret, this is what I have so far 😃

fn (%x: Tensor[(4), int32], %y: Tensor[(4), int32])
    -> Tensor[(4), int32] {
  add(%x, %y)
}
import "primitives/std.lib";
component main() -> () {
  cells {
    x = prim std_mem_d1(32, 4, 2);
    y = prim std_mem_d1(32, 4, 2);
    constant0 = prim std_const(32, 0)
    constant1 = prim std_const(32, 1)
    ret  = prim std_mem_d1(32, 4, 2);
    add1 = prim std_add(32);
    mem0 = prim std_mem_d1(32, 4, 2)
    const2 = prim std_const(32, 4)
    i3 = prim std_reg(32)
    le5 = prim std_le(32)
  }
  wires {
    group group10 {
      ret.in = mem0;
      ret.write_en = None;
      group10[done] = ret[done];
    }
    group cond6 {
      cond6[done] = 1'd1
      le5.left = i3.out
      le5.left = const2.out
    }
    group initalize7 {
      i3.in = constant0.out
      i3.write_en = 1'd1
      initalize7[done] = i3.done
    }
    group body8 {
      mem0.addr0 = i3.out
      mem0.write_en = 1'd1
      add1.left = x.read_data
      add1.right = y.read_data
      x.addr0 = i3.out
      y.addr0 = i3.out
      mem0.write_data = 1'd1 ? add1.out
      body8[done] = mem0.done ? 1'd1
    }
    group update9 {
      i3.write_en = 1'd1
      add4.left = i3.out
      add4.right = constant1.out
      i3.in = 1'd1 ? add4.out
      update9[done] = i3.done ? 1'd1
    }
  }
  control {
    seq {
      while le0.out with cond0 {
        initalize7
        seq {
          body8
          update9
        }
      }
      group10
    }
  }
}
sampsyo commented 3 years ago

Ah, I see! Yeah, I didn't quite put together that the order would be "inverted" for let. For the specific case of let, maybe the thing to do is this for the visit_let case:

How about that?

ViviYe commented 3 years ago

that makes sense! thank you!

ViviYe commented 3 years ago

should the flag be passed down the visit functions as a parameter? I have a stupid question 😢 : Since we are overriding the visiting functions from the parent class I am not sure how to add an additional parameter?

sampsyo commented 3 years ago

Right! I was thinking that the way to get around that would be to just assign a field on the visitor object, like self.is_ret. It's not pretty but it'll probably work!

ViviYe commented 3 years ago

I think I got it working with is_ret but there is one edge case I think:

fn(%x, %y, %z){
   let %v = add(%x ,%y)
   %v
}

In this case %v is the return value but we won't know at the visit_call time?

sampsyo commented 3 years ago

Yeah, that is a good call. For these, I think the right thing to do is probably to emit a copy. Then one can rely on higher-level optimizations at the Relay layer (i.e., a sensible Relay optimization might rewrite that into just add(%x, %y), removing the let altogether).

ViviYe commented 3 years ago

I fixed the write_enble and updated tests. I think tensor add and subtract is mostly working on my branch relay2futil! would trying to implement a simple operator in dahlia be a good next step?

sampsyo commented 3 years ago

Hey; that's awesome!! Can you open a PR with the new features and we'll give it a shot?

I do think that implementing something simple in Dahlia would be a great next step. Maybe a ReLU, just to keep it simple at first?

ViviYe commented 3 years ago

YAY I will try to make sure all the outputs compile with the Futil compiler and then make a PR!!

ViviYe commented 3 years ago

Made a PR!

cgyurgyik commented 3 years ago

Is this issue still open? What is the PR # linked to this?

rachitnigam commented 3 years ago

I don’t see anything obvious. There is a test with the relay compiler that seems to use tensors.

sampsyo commented 3 years ago

That would be #196! Also feel free to ping @ViviYe to get her help understanding where she left things off.