Imperative stages cannot be streamed

chhzh123 commented 4 years ago

It's able to stream single imperative stage. However, code mixes with imperative and declarative stages cannot be streamed as usual, as shown below.

def test_imperative():
    dtype = hcl.Float()
    A = hcl.placeholder((4, 4), "A", dtype)

    def kernel(A):

        def func(data):
            out = hcl.compute((4, 4),lambda x, y: 0, "out", dtype)
            with hcl.Stage("S"):
                with hcl.for_(0, 4, name="i") as i:
                    with hcl.for_(0, 4, name="j") as j:
                        out[i, j] = data[i, j] + 1
            return out

        B = func(A)
        C = hcl.compute((4,4), lambda i, j: B[i, j] + 1, "C")
        return C

    s = hcl.create_schedule([A], kernel)

    target = hcl.platform.zc706
    target.config(compile="vivado_hls",mode="csyn")
    s.to(A, target.xcel)
    s.to(kernel.C, target.host)
    f = hcl.build(s, target=target)
    np_A = np.random.randint(0, 10, A.shape)
    hcl_A = hcl.asarray(np_A,dtype)
    hcl_B = hcl.asarray(np.zeros((4, 4),np.float),dtype)
    f(hcl_A, hcl_B)

heterocl.tvm._ffi.base.TVMError: [00:37:15] src/schedule/schedule_reorder.cc:345: Check failed: input.size() > 0 Cannot found boundary for output [Tensor(shape=[4, 4], op.name=C)]. The compilation flow requires the device scope to form an enclosed subgraph. Make sure the input tensors are moved to FPGA correctly...

Seems some nodes in the stage graph are not annotated correctly.

seanlatias commented 4 years ago

What happens if you also specify s.to(out, target.xcel)?

seanlatias commented 4 years ago

@Hecmay, do we handle the case where the tensor is declared inside (i.e., not via the arguments, like out in this case)?

chhzh123 commented 4 years ago

What happens if you also specify s.to(out, target.xcel)?

The error message is the same, but it marks the stage twice.

[01:49:01] Mark stage S on FPGA scope...
[01:49:01] Mark stage S on FPGA scope...
[01:49:01] Mark stage C on FPGA scope...

hecmay commented 4 years ago

@seanlatias Yes, we can. In that case, we should be able to attach the tensor automatically to the imperative stage, I suppose. I added a pass to fix this issue -- we still use the dataflow analysis and restored DFG to partition the graph. If there is any problem with the graph partitioning (like in this case, the restored DFG does not correctly capture the stage hierarchy), we just offload the whole graph to FPGA.

hecmay commented 3 years ago

This is fixed already. Test cases: https://github.com/cornell-zhang/heterocl/blob/heteroflow/tests/test_schedule_stream.py#L86

cornell-zhang / heterocl

Imperative stages cannot be streamed #271