`hlib.nn.conv2d_nchw` with padding goes wrong

cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing

https://cornell-zhang.github.io/heterocl/

Apache License 2.0

322 stars 92 forks source link

`hlib.nn.conv2d_nchw` with padding goes wrong #185

Closed chhzh123 closed 4 years ago

chhzh123 commented 4 years ago

When I call the following function

conv1 = hlib.nn.conv2d_nchw(input_image, weight_conv1, padding=[[2,2],[2,2]])

I get an error

  File "/heterocl/hlib/python/hlib/nn.py", line 23, in _pad
    if equal_const_int(pad_before[i], 0) and equal_const_int(pad_after[i], 0):
  NameError: name 'equal_const_int' is not defined

It seems some packages are missing and should be added to header.

The bug results from non-zero padding, which will call the hlib.nn.pad function that is not covered by automatic test, as shown below.

    if padding != [[0,0],[0,0]]:
        Input = pad(Input, pad_before, pad_after)

I find PR #120 wants to fix this issue, but no ongoing efforts are made.

zhangzhiru commented 4 years ago

@seanlatias I think hlib.nn is still a work in progress? Do you want @chhzh123 to write its own conv2d kernel instead?

seanlatias commented 4 years ago

This issue is already fixed with v0.3.

zhangzhiru commented 4 years ago

Shall we merge then?

seanlatias commented 4 years ago

Let me release v0.3 after I merge the HLS IP PR.

chhzh123 commented 4 years ago

@seanlatias I'm currently developing in v0.3, but the issue still exists in this repository. I have fixed in my own branch, should I submit a PR? @zhangzhiru To support BNN, I need to add binary layers in HeteroCL, so maybe I still have to write my own conv2d kernel.

seanlatias commented 4 years ago

You should use hlib.op.nn instead of hlib.nn. I forgot to remove that one.

chhzh123 commented 4 years ago

You should use hlib.op.nn instead of hlib.nn. I forgot to remove that one.

It seems more bugs are involved...

CommandLine Error: Option 'xcore-max-threads' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

seanlatias commented 4 years ago

Can you check if you can run the lenet example under samples successfully?

seanlatias commented 4 years ago

If not then maybe this bug comes from the latest LLVM. You are using 9.0, right?

chhzh123 commented 4 years ago

Actually, the bug comes from different versions of TVM I installed. Maybe we should import heterocl.tvm in hlib/frontend instead of depending on the third-party TVM.

hecmay commented 4 years ago

TVM is changing constantly. As a workaround: checkout TVM to a specific version. Please refer to the CI deployment script here: https://github.com/cornell-zhang/heterocl/blob/v0.3/.circleci/install_tvm.sh#L4

zhangzhiru commented 4 years ago

Yes, we should check out a stable version of TVM.

chhzh123 commented 4 years ago

Yes, I know this would work. But we have already had a TVM in HeteroCL, I'm just wondering why we need to install another TVM outside HeteroCL, which seems unnecessary.

seanlatias commented 4 years ago

We would gradually remove the TVM inside HeteroCL. The reason why we install another TVM is to use them as a library like NumPy. Currently, we rely on Relay for parsing an ML network. So I think what we should do is check out a stable TVM version once in a while, instead of following the bleeding-edge version.

chhzh123 commented 4 years ago

The issue is fixed in v0.3. Thanks!