Closed sampsyo closed 3 years ago
Minor nit: Probably don't want the type to be floats. Take a look at the mem primitives in the library.
I am trying to implement adding 2 tensors and I am kinda stuck on what to return for visiting a tensor variable. should I just return the name of the cell since there is no .out for a std mem?
Good question. Short answer: probably.
Long answer: The problem here is differences in what is considered a value in FuTIL vs TVM. TVM says that tensors are "just" values and you can pass them anywhere you want. However, for FuTIL (and hardware in general), memories are pieces of circuitry that can't just be passed around. Worse still, unlike std_reg
, returning the cell might be incorrect since you have to correctly synthesize other groups to use that memory.
The thing to start with is probably writing down an example here and then imagining what the generated FuTIL program should look like.
I tried compiling the dahlia program
decl A: ubit<32>[8];
decl B: ubit<32>[8];
decl v: ubit<32>[8];
for (let i: ubit<4> = 0..8) {
v[i] := A[i] + B[i];
}
If I understood correctly, when add is called on 2 tensors, we will need to
if we just return the name of the hardware, we won't have the sizes or dimensions of the arrays available at visit_call : ( we also will have to somehow mark it as a mem object instead of a scalar variable. I just couldn't come up with a nice and clean way to handle this 😢
Right, that's precisely the problem. There is one other possible solution. If you call the generated Dahlia program with the futil compiler and give the compiler -p external
, it rename all memory accesses into reads from the port of the component.
The point of the -p external
thing is to tell the compiler that "this memory comes from the outer world. here are wires that let you interact with it". Maybe it'll be fruitful to think about memories in that way.
if we just return the name of the hardware, we won't have the sizes or dimensions of the arrays available at visit_call : ( we also will have to somehow mark it as a mem object instead of a scalar variable. I just couldn't come up with a nice and clean way to handle this 😢
A couple of ideas come to mind for this:
mem_info
. So whenever you add a memory to the structure (cells), you also add information about it to this map for later use.expr.checked_type
(cf reference) to look up the type of the array being used in an expression. Then that could tell you something about the corresponding memory.For using check_type, does that mean to check the expr type before even calling visit on the add arguments?
Hmm, I'm not sure the order matters… it seems like you could recursively call the emitter to generate code for both arguments, and then use their type to decide how to use them?
That makes sense! So the visiting call will be independent from the type checking :) depending on the returned tensor dimension from type checking the child expression I then determine whether to use a memory or register?
Yes, that sounds about right to me!
I have another question now: at the function level, I made a mem cell for ret based on the return type. Somehow I will have to connect the ret to the output of add(which we don't know will be the return value when we call visit_call). I assume we can always just copy the memory but are there any better way to deal with it 🤓 ?
That's a good question—to confirm, the issue is that you visit the add
, and all that goes great, and it produces a memory. But only after doing that, your visitor then realizes that this add
was the return value (i.e., the "top" of the expression tree). So how to hook up the previously-created result memory to the ret
memory?
If that's the case, then here is one super-dumb strategy:
is_ret
.is_ret
to True.visit_call
, check whether is_ret
is set. If it is, then don't create a new memory for the add
call—instead, use the existing ret
memory. If not, create a new memory as before. Regardless, set is_ret
to False for future recursive calls.Does that make sense?
I am am not sure if I understood completely for the following program
fn(%x, %y, %z){
let %v = add(%x ,%y)
add(%v, %z)
}
The AST will be Let -> Call so add(%x ,%y) will be visited before add(%v, %z) and we would want to create a new memory for add(%x ,%y)?
except for the handling ret, this is what I have so far 😃
fn (%x: Tensor[(4), int32], %y: Tensor[(4), int32])
-> Tensor[(4), int32] {
add(%x, %y)
}
import "primitives/std.lib";
component main() -> () {
cells {
x = prim std_mem_d1(32, 4, 2);
y = prim std_mem_d1(32, 4, 2);
constant0 = prim std_const(32, 0)
constant1 = prim std_const(32, 1)
ret = prim std_mem_d1(32, 4, 2);
add1 = prim std_add(32);
mem0 = prim std_mem_d1(32, 4, 2)
const2 = prim std_const(32, 4)
i3 = prim std_reg(32)
le5 = prim std_le(32)
}
wires {
group group10 {
ret.in = mem0;
ret.write_en = None;
group10[done] = ret[done];
}
group cond6 {
cond6[done] = 1'd1
le5.left = i3.out
le5.left = const2.out
}
group initalize7 {
i3.in = constant0.out
i3.write_en = 1'd1
initalize7[done] = i3.done
}
group body8 {
mem0.addr0 = i3.out
mem0.write_en = 1'd1
add1.left = x.read_data
add1.right = y.read_data
x.addr0 = i3.out
y.addr0 = i3.out
mem0.write_data = 1'd1 ? add1.out
body8[done] = mem0.done ? 1'd1
}
group update9 {
i3.write_en = 1'd1
add4.left = i3.out
add4.right = constant1.out
i3.in = 1'd1 ? add4.out
update9[done] = i3.done ? 1'd1
}
}
control {
seq {
while le0.out with cond0 {
initalize7
seq {
body8
update9
}
}
group10
}
}
}
Ah, I see! Yeah, I didn't quite put together that the order would be "inverted" for let
. For the specific case of let
, maybe the thing to do is this for the visit_let
case:
is_ret
is false, leave it false.is_ret
is true, turn it to false for visiting the bound expression (add(%x ,%y)
above) but then set it back to true when visiting the body (add(%v, %z)
).How about that?
that makes sense! thank you!
should the flag be passed down the visit functions as a parameter? I have a stupid question 😢 : Since we are overriding the visiting functions from the parent class I am not sure how to add an additional parameter?
Right! I was thinking that the way to get around that would be to just assign a field on the visitor object, like self.is_ret
. It's not pretty but it'll probably work!
I think I got it working with is_ret but there is one edge case I think:
fn(%x, %y, %z){
let %v = add(%x ,%y)
%v
}
In this case %v is the return value but we won't know at the visit_call time?
Yeah, that is a good call. For these, I think the right thing to do is probably to emit a copy. Then one can rely on higher-level optimizations at the Relay layer (i.e., a sensible Relay optimization might rewrite that into just add(%x, %y)
, removing the let
altogether).
I fixed the write_enble and updated tests. I think tensor add and subtract is mostly working on my branch relay2futil! would trying to implement a simple operator in dahlia be a good next step?
Hey; that's awesome!! Can you open a PR with the new features and we'll give it a shot?
I do think that implementing something simple in Dahlia would be a great next step. Maybe a ReLU, just to keep it simple at first?
YAY I will try to make sure all the outputs compile with the Futil compiler and then make a PR!!
Made a PR!
Is this issue still open? What is the PR # linked to this?
I don’t see anything obvious. There is a test with the relay compiler that seems to use tensors.
That would be #196! Also feel free to ping @ViviYe to get her help understanding where she left things off.
From #182. A good next step would be to support a Relay program like this:
This will require generating and interacting with memories to hold the tensors, as we do in the Dahlia frontend.