Closed seanlatias closed 3 years ago
@chhzh123 Can you try it out?
It's weird that using hcl.const_tensor
may significantly increase the running time of CPU simulation. Also, some results involving fixed points seem incorrect, and I'm figuring out why.
@chhzh123 for fixed-point numbers, you know you need to specify the data type in the API, right? We would not be able to infer it.
@chhzh123 also, you mean comparing with hcl.copy
it's much slower?
@chhzh123 for fixed-point numbers, you know you need to specify the data type in the API, right? We would not be able to infer it.
Yes, I have specified the data type in the API, but the result of my BNN is incorrect. I'm checking if some of the layers go wrong.
@chhzh123 also, you mean comparing with
hcl.copy
it's much slower?
No, I mean hcl.const_tensor
is much slower than the previous implementation that directly passes the tensors in the function arguments. I also notice that not only LLVM slows down, but HLS gets slower (from 10min synthesis to 70min).
I mean hcl.const_tensor is much slower than the previous implementation that directly passes the tensors in the function arguments.
It's possible if we are declaring a const array with a huge size. How large is the weight tensor? Do you see a similar slow down with the small examples.
I mean hcl.const_tensor is much slower than the previous implementation that directly passes the tensors in the function arguments.
It's possible if we are declaring a const array with a huge size. How large is the weight tensor? Do you see a similar slow down with the small examples.
The largest weight tensor has 4096 fixed-point numbers (batch norm layer). I tested design for one convolutional layer, which didn't show observable slow down.
The largest weight tensor has 4096 fixed-point numbers (batch norm layer). I tested design for one convolutional layer, which didn't show observable slow down.
How many constant numbers are there in total?
The largest weight tensor has 4096 fixed-point numbers (batch norm layer). I tested design for one convolutional layer, which didn't show observable slow down.
How many constant numbers are there in total?
About 6k for the small BNN.
@chhzh123 I think there are still some bugs with this API in terms of CPU simulation. Please go ahead and use CSIM instead.
I tested several methods this week and found it would be better to declare these large const arrays as global variables, i.e. declaring them before the top function. For my small BNN design, if the weight tensors are declared as local variables, it takes 2h to complete HLS. However, if I move the const tensors outside the function, it only takes 4min to finish!
The description is updated according to the fixes. Also, a known issue is added for slow CPU execution.
@zhangzhiru, please see if the HLS codegen looks good to you. Not sure what's the best name for the header file that contains the constant arrays.
I suppose we can create a header file per constant array using the name of the corresponding tensor?
I suppose we can create a header file per constant array using the name of the corresponding tensor?
Yes, we can do that.
In this PR, we develop a new API
hcl.const_tensor
that allows users to declare a constant tensor. The initial values can be given from a Python list or a NumPy array. Example usage is as follows.Or, with a NumPy array
Since they are constant tensors, we do not allow users to initialize a constant tensor from another HeteroCL tensor. Moreover, in this PR, we also implement the codegen for HLS C. The above code will result in the following HLS codes. An extra header file that contains all constants will be generated.
Unit Tests: Please refer to
tests/test_compute_basic.py
. All data types with different shapes are tested.Known Issues: Slow CPU execution due to current implementation. I haven't come up with a better solution yet. I'll file an issue to solve this separately. For now, we encourage people to use CSIM if they declare a large constant array (e.g., more than 1000 elements).