fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.25k stars 405 forks source link

Zero size array is nnet_common.h #300

Open KOVI89alipes opened 3 years ago

KOVI89alipes commented 3 years ago

In C++ it is illegal to declare an array of zero length. https://stackoverflow.com/a/6180200/14582299

But it's allowed in the GNU C extension https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

Zero size array appears in the reduce template function

https://github.com/fastmachinelearning/hls4ml/blob/c3509cfeb5f80054e01dead612392a9411ebfbe2/hls4ml/templates/vivado/nnet_utils/nnet_common.h#L47

template<class T, int N, class Op>
 T reduce(T* x, Op op){
    static constexpr int leftN = pow2(floorlog2(N - 1)) > 0 ? pow2(floorlog2(N - 1)) : 0;
    static constexpr int rightN = N - leftN > 0 ? N - leftN : 0;
    if(N == 1){
        return x[0];
    }else if(N == 2){
        return op(x[0],x[1]);
    }else{
        T left[leftN];
        T right[rightN]; <<<<<<<<<<<<
        #pragma HLS array_partition variable=left complete
        #pragma HLS array_partition variable=right complete
        ReduceLeft: for(int i = 0; i < leftN; i++){
            left[i] = x[i];
        }
        ReduceRight: for(int i = 0; i < rightN; i++){
            right[i] = x[i+leftN];
        }
        return op(reduce<T,leftN,Op>(left, op), reduce<T,rightN,Op>(right, op));
    }
 }

And it gives an error in MSVC compilation

Possible fix - replace if with if constexpr

 template<class T, int N, class Op>
 T reduce(T* x, Op op){
    static constexpr int leftN = pow2(floorlog2(N - 1)) > 0 ? pow2(floorlog2(N - 1)) : 0;
    static constexpr int rightN = N - leftN > 0 ? N - leftN : 0;
    if constexpr(N == 1){
        return x[0];
    }else if constexpr (N == 2){
        return op(x[0],x[1]);
    }else{
        T left[leftN];
        T right[rightN];

        #pragma HLS array_partition variable=left complete
        #pragma HLS array_partition variable=right complete
        ReduceLeft: for(int i = 0; i < leftN; i++){
            left[i] = x[i];
        }
        ReduceRight: for(int i = 0; i < rightN; i++){
            right[i] = x[i+leftN];
        }
        return op(reduce<T,leftN,Op>(left, op), reduce<T,rightN,Op>(right, op));
    }
 }
thesps commented 3 years ago

I think I understand the problem. However, I don't think Vivado would support the suggested fix. I believe if constexpr is a C++17 feature, while Vivado HLS supports only (some of) C++11. I think Vitis HLS brings support up to C++14.

Trying to compile an hls4ml project using the proposed fix with Vivado 2019.2 (GCC 4.6.3 for csim, clang 3.1) on Ubuntu 20.04:

../../../../firmware/nnet_utils/nnet_common.h: In function ‘T nnet::reduce(T*, Op)’:
../../../../firmware/nnet_utils/nnet_common.h:47:5: error: expected ‘(’ before ‘constexpr’
  if constexpr (N == 1){
     ^~~~~~~~~

Does it work on Windows in Vivado HLS?

anders-wind commented 3 years ago

If you wish to keep c++14 support you can solve the problem via template specializations.

Something like

    template <int N>
    struct Reduce
    {
        static constexpr int leftN = pow2(floorlog2(N - 1)) > 0 ? pow2(floorlog2(N - 1)) : 0;
        static constexpr int rightN = N - leftN > 0 ? N - leftN : 0;

        template <class T, class Op>
        static T reduce(T *x, Op op)
        {
            T left[leftN];
            T right[rightN];

#pragma HLS array_partition variable = left complete
#pragma HLS array_partition variable = right complete
        ReduceLeft:
            for (int i = 0; i < leftN; i++)
            {
                left[i] = x[i];
            }
        ReduceRight:
            for (int i = 0; i < rightN; i++)
            {
                right[i] = x[i + leftN];
            }
            return op(Reduce<leftN>::reduce<T, Op>(left, op), Reduce<rightN>::reduce<T, Op>(right, op));
        }
    };

    template <>
    struct Reduce<2>
    {
        template <class T, class Op>
        static T reduce(T *x, Op op)
        {
            return op(x[0], x[1]);
        }
    };

    template <>
    struct Reduce<1>
    {
        template <class T, class Op>
        static T reduce(T *x, Op op)
        {
            return x[0];
        }
    };

    template <class T, int N, class Op>
    T reduce(T *x, Op op)
    {
        return Reduce<N>::reduce(x, op);
    }
thesps commented 3 years ago

I think that should work. Perhaps someone can try this and make a PR?

anders-wind commented 3 years ago

Just to make it clear, is there a reason why we copy the contents of x into left and right? I'm unsure what the HLS pragmas do but a simpler and more effective operation would be:

template<class T, int N, class Op>
T reduce(const T* x, Op op)
{
    static constexpr int leftN = pow2(floorlog2(N - 1)) > 0 ? pow2(floorlog2(N - 1)) : 0;
    static constexpr int rightN = N - leftN > 0 ? N - leftN : 0;
    if (N == 1){
        x[0];
    }
    else if (N == 2){
        return op(x[0],x[1]);
    }
    return op(reduce<T,leftN,Op>(x, op), reduce<T,rightN,Op>(x+leftN, op));
}

If that is possible then I guess we dont need all the template specialization etc.

I'll gladly do the PR in any case

thesps commented 3 years ago

I think that could do it. The purpose of the reduce function is to force Vivado to use a balanced tree for some binary operator. We use it to sum an array, or find the element with the maximum value for example. So the pragmas are just part of making sure we get the correct structure at the end (see image).

To see that it's worked as intended I would check that the latency and II of a model that uses it doesn't change after C Synthesis. This could be done simply with some small standalone (outside of hls4ml) test that just calls the reduce function. I'd also want to see that it's functionally correct, again that could be done in a standalone testbench to check that when used with Op_add, it gives the same result as summing the elements in a loop.

One of the example models using Softmax would also be a good probe, for example the KERAS_3layer model in the example_models directory (Github submodule). To run the tests offline one can execute bash hls4ml-keras-test.sh in the test/ directory. You can skip some models by commenting out lines in keras-models.txt. I think we only need to look at KERAS_3layer.

vloncar commented 3 years ago

Also Pooling with io_stream needs to be checked.