Open hecmay opened 4 years ago
Do we translate the parallel() primitive to a corresponding pragma in HLS?
Currently the parallel
primitive is only for CPU, which triggers multi-threaded execution.
As I mentioned before, we need to support it for hardware synthesis. Shall we open another issue? If not, this will fall through the cracks again.
It's ignored in HLS code generator. I am considering to let the CodeGenC to translate .parallel()
to OpenMP pragmas.
And for HLS codegen, we may use .parallel()
to perform kernel replication to exploit data-lvele parallelism?
@Hecmay yes, we can at least use it for the OpenCL flow. I believe the Merlin compiler supports parallel execution as well.
The issue occurs in the digit recognition example with the
.parallel()
primitive. I was trying to use a kernel function to update theknn_mat
instead of callinghcl.compute
, and perform scheduling on theitervars
inside the kernel function (i.e. hcl module). The program after modification looks like:And the scheduling is performed as the following snippet:
All other scheduling primitives work well, but when I call the
.parallel()
. The program will error out with a segmentation fault.