StanfordAHA / Halide-to-Hardware

Other
74 stars 12 forks source link

Error when doing make run-clockwork on new max pooling example #97

Open dillonhuff opened 3 years ago

dillonhuff commented 3 years ago

@jeffsetter @thenextged I've gotten the CPU code compiling and running for my max-pooling example (https://github.com/StanfordAHA/Halide-to-Hardware/tree/maxpool_example/apps/hardware_benchmarks/apps/max_pool_2x2), but when I run make run-clockwork I get the following error:

dhuff@kiwi:~/h2h2clockwork/Halide-to-Hardware/apps/hardware_benchmarks/apps/max_pool_2x2$ make run-clockwork
g++-7 -std=c++17 -I../../../../../clockwork -I../../../../../clockwork/include -I/home/dhuff/h2h2clockwork/Halide-to-Hardware/clockwork/barvinok-0.41/isl -fPIC -I/home/dhuff/h2h2clockwork/clockwork/barvinok-0.41/isl/ -c bin/clockwork_codegen.cpp -o bin/clockwork_codegen.o
In file included from bin/clockwork_codegen.cpp:2:0:
bin/maxpool_memory.cpp: In function ‘prog maxpool()’:
bin/maxpool_memory.cpp:11:27: error: ‘arg_0’ was not declared in this scope
 int32_t &hw_output_s0_c = arg_0;
                           ^~~~~
../../hw_support/hardware_targets.mk:138: recipe for target 'bin/clockwork_codegen.o' failed
make: *** [bin/clockwork_codegen.o] Error 1

Any idea what is going on here? Thanks!

gednyengs commented 3 years ago

Let me take a look (although I think Jeff might be of more help with this error)

gednyengs commented 3 years ago

Hmm. I'm getting a different error. Are bin/clockwork_testscript.h and bin/clockwork_testscript.cpp generated when you run-clockwork ?

dillonhuff commented 3 years ago

I've seen several different errors depending on the order in which I call the make commands.

Yes I see bin/clockwork_testscript.h and bin/clockwork_testscript.cpp in bin after I run make run-clockwork.

The other error I sometimes get is:

[INFO] Module.compile(): clockwork_source_name = ./bin/maxpool_clockwork.cpp
creating file from name: maxpool
[INFO] Module.compile(): clockwork header file = ./bin/maxpool.h
creating file from name: maxpool
outputting clockwork target named maxpool
Error at ../../../../distrib/tools/GenGen.cpp:4:
Can't represent an integer with this many bits in C: (void *)
../../hw_support/hardware_targets.mk:132: recipe for target 'bin/maxpool_memory.cpp' failed
make: *** [bin/maxpool_memory.cpp] Aborted (core dumped)
make: *** Deleting file 'bin/maxpool_memory.cpp'

What error are you seeing?

gednyengs commented 3 years ago

Ah! that's the error I'm getting. I'm tracing it now

gednyengs commented 3 years ago

I just pushed a temporary fix to this maxpool_example branch. can you pull this and try again?

dillonhuff commented 3 years ago

@thenextged trying now.

gednyengs commented 3 years ago

@jeffsetter The _hls_target ProducerConsumer is getting inserted at the wrong place in this app. The outer loop level is outside the _hls_target PC node (which should never happen). I think the right thing to do is to make sure that the _hls_target PC node is the first child of the output function's PC node

gednyengs commented 3 years ago

For now, I went back to using the name hw_output as the accelerator boundary

dillonhuff commented 3 years ago

@thenextged is there an issue with the way the app itself is written that I could fix? Or is the problem in code generation?

dillonhuff commented 3 years ago

@thenextged just finished running and comparing max pool and they diff the same. Thank you!

gednyengs commented 3 years ago

Great! I think the app's Halide code is fine. The codegen fix shouldn't be too bad (I'll let Jeff comment on that)

jeffsetter commented 3 years ago

To adjust the loops, you can use hw_output.reorder(xi, yi, c, xo, yo) This moves the loops so that the channel loop is within the accelerator (which includes loops between xi and xo.

Patch that evaluates correctly is:


index e2238b484..390c303db 100644
--- a/apps/hardware_benchmarks/apps/max_pool_2x2/maxpool_generator.cpp
+++ b/apps/hardware_benchmarks/apps/max_pool_2x2/maxpool_generator.cpp
@@ -77,6 +77,7 @@ public:

           hw_output
             .tile(x, y, xo, yo, xi, yi, imgSize, imgSize)
+            .reorder(xi, yi, c, xo, yo)
             .hw_accelerate(xi, xo);

           pooled.compute_at(hw_output, xo);
diff --git a/src/CodeGen_RDAI.cpp b/src/CodeGen_RDAI.cpp
index f102879e5..b7e658b66 100644
--- a/src/CodeGen_RDAI.cpp
+++ b/src/CodeGen_RDAI.cpp
@@ -163,7 +163,8 @@ void CodeGen_RDAI::visit(const Call *op) {
 }

 void CodeGen_RDAI::visit(const ProducerConsumer *op) {
-    string target_prefix = "hw_output";
+  //string target_prefix = "hw_output";
+  string target_prefix = "_hls_target";
     if(starts_with(op->name, target_prefix) && op->is_producer) {
         Stmt hw_body = substitute_in_all_letstmts(op->body);
gednyengs commented 3 years ago

@jeffsetter both of these loop nests should be valid 1: (xi, yi, xo, yo, c) and 2: (xi, yi, c, xo, yo)) right? I think if we make sure that all For nodes in-between hw_output and _hls_target PC nodes are moved inside the _hls_target PC node, then all will be good. Otherwise, we're limiting ourselves in terms of schedules

jeffsetter commented 3 years ago

Yea I think both loop nests should be allowed. If you want to have all of the loops done on the accelerator you can include hw_accelerate(xi, c) with reorder(xi, yi, xo, yo, c).

However, I'm not sure why all of those loops need to be in the accelerator. Is there some assumption in that loops can't be outside the hls_target

gednyengs commented 3 years ago

Yeah the issue is I'm not supporting scalar variables as accelerator function parameters yet (which we will need in this case to pass the current value of c if it is outside the accelerator). A better solution if for me to add support of that. But I might not get to it at least until after I get the FPGA flow to work

gednyengs commented 3 years ago

Or as an alternative, I can emit crops/slices and rewrite the IR to collapse the external dimensions. This way we only keep buffer arguments. Any ideas about other/better alternatives?

Also, I want to make sure that the strategy I choose here is compatible with your target generator