cornell-zhang / bnn-fpga

Binarized Convolutional Neural Networks on Software-Programmable FPGAs
BSD 3-Clause "New" or "Revised" License
301 stars 112 forks source link

Why the fpga-code only predicts one label??? #16

Closed ArchieGu closed 6 years ago

ArchieGu commented 6 years ago

image

As you can see, the code only predicts label 7, and we don't quite sure where the problem is.

We use the parameters extracted from Pytorch. And we also changed the input value range [-1,1] as the Theano version did.

rzhao01 commented 6 years ago

Is this the code I provided or did you make changes? Please be clear in the question as there are other people reading these threads.

EDIT: I think you are replacing the parameters I provided with your own parameters extracted from Pytorch. There is probably an issue with your weights, but without more details I can't debug it. I'm not sure what you mean by "changed the input value range [-1, 1]" because the CIFAR10 images I provide should already be in the range [-1, 1].

ArchieGu commented 6 years ago

@rzhao01 Hi, sorry for bothered you again.

  1. I didn't make any changes to your code.
  2. Yes, I used my own parameters extracted from Pytorch.
  3. The cifar10 images your provided is already in the range[-1,1], because it is the default processing range for Theano. In Pytorch, the default range is [0,1], so I have to make some changes to the images.
rzhao01 commented 6 years ago

I can't pinpoint the issue but I would guess the problem is with changing the image range. How did you implement that change?

I would recommend just taking the data array on this line , and applying (x+1)/2 scaling to each element.

ArchieGu commented 6 years ago
richie

The method I use is provided by Pytorch tutorial.

rzhao01 commented 6 years ago

Oh you made changes in your PyTorch script. I was confused and thought you changed the input range in my provided code.

I can't really help you debug the PyTorch training script, I'm not familiar with PyTorch and it's not my code to begin with. I can help you with my C++ code if you tell me exactly what changes you have made.

ArchieGu commented 6 years ago

OK, you've been great help, thank you so much~

SHuixo commented 6 years ago

@rzhao01
Thanks for your answer.

  1. In the code that handles input data, here, there is only one binarize_input_images function for the input data. However, in the implementation of the function on this line, there is no binary method statement. How is it achieved?
  2. whether the image and label in the data file and the weight,k,h in the params must be extracted from the same framework. Because the distribution range of image is not the same among different frames. The default of the pytorch is [0,1], and Matth.‘s ’theano code is [-1,1].
  3. The distribution of the training data will affect the final identification of the board.

thank you.

rzhao01 commented 6 years ago
  1. The function is poorly named - it doesn't actually binarize the image, it just converts it to fixed point representation (C1InputType). In the Courbarieaux BNN the first conv layer takes in non-binary inputs.
  2. If you train with an input range of [0,1], then the test images need to also be in the same range. You cannot train a network on [0,1] images and expect it to work on [-1,+1] images. But it should be possible to convert between ranges: [0,1] -> [-1,1] is just the transform 2*x - 1.
  3. Yes, see my response above.

In ArchieGu's case, he said he trained on [0,1] images and his test inputs are in the same range, so this should not be an issue. I'm not sure what is wrong, I think there is another bug somewhere.

ArchieGu commented 6 years ago

@Sun-xiaohui I think what @rzhao01 means is that if your input is in range of [-1,1], then nothing need to be changed, if your input is in range of [0,1] then reshape your input into [-1,1] and continue with the original code or you can use 2*x -1 to modify the fpga code to read the input in range [0,1]. After this you can use your parameters to test the code.

SHuixo commented 6 years ago

@ArchieGu Yes, whether you choose to add processing during training or processing on the code side of the board, the purpose of adding the formula conversion is to make the data range when training and the data range when testing on the board be the same.

SHuixo commented 6 years ago

@rzhao01 Hello, I trained 500 times using the Theano version code provided by Matth, and put out the corresponding parameters in the file of the code save path.

 np.savez(save_path, *lasagne.layers.get_all_param_values(model))

However, after replacing the trained parameter file with the file in the param on the corresponding board, the error rate of the test result always fluctuates from 50% to 60%. I tried many training results and could not match the low error rate during training. Is there any detail that needs attention?

rzhao01 commented 6 years ago

Did you make neccessary changes to the parameters? For the accelerator we make two changes: (1) remove biases, and (2) transform the batch norm parameters.

SHuixo commented 6 years ago

@rzhao01 Thanks for your reply. Yes, I reconfirmed the program code again, where the bias added a new parameter b=None. The file in the corresponding save path contains a total of 45 files. Each of the 5 files corresponds to w ,beta ,gamma ,mean ,inv_std. The parameter file in the corresponding Batchnorm is processed as follows:

beta = np.load("./theano/arr_1.npy")
gamma = np.load("./theano/arr_2.npy")
mean = np.load("./theano/arr_3.npy")
inv_std = np.load("./theano/arr_4.npy")

k = gamma / inv_std
h = beta - mean * gamma / inv_std

Take all the files to get w, k, h, and save them as the corresponding 27 files in order. Is there anything wrong with my two-step operation?

Thanks.

rzhao01 commented 6 years ago

If you are using Lasagne, then inv_std is the reciprocal of the standard deviation, you should be multiplying instead of dividing by it.

You can test whether your kh calculation is working in Python. Simply write a new batch norm layer with k and h parameters, and have the layer return input * k + h. Then you can test your modified parameters.npy before importing it in C++.

SHuixo commented 6 years ago

@rzhao01 Thanks for your reply. Yes, it should be multiplied by the inv_std variable. Why is the parameter k, h extracted from the corresponding batchnorm not directly used, but also try a test? The parameter variables obtained are consistent with the data dimensions in the params you provide. Is the process of data participation still different?

The code for batchnorm corresponding to k and h is as follows.

class MyBatchNorm(lasagne.layers.BatchNormLayer):
  def __init__(self, incoming, k, h, axes='auto',**kwargs):
    super(lasagne.layers.BatchNormLayer, self).__init__(incoming, **kwargs)
    if axes == 'auto':
      axes = (0,) + tuple(range(2, len(self.input_shape)))
    elif isinstance(axes, int):
      axes = (axes,)
    self.axes=axes
    self.k = k
    self.h = h

  def get_output_for(self, input, deterministic=False, **kwargs):
    param_axes = iter(range(input.ndim - len(self.axes)))
    pattern = ['x' if input_axis in self.axes
               else next(param_axes)
               for input_axis in range(input.ndim)]
    k = self.k.dimshuffle(pattern)
    h = self.h.dimshuffle(pattern)
    #k = self.k
    #h = self.h
    # normalize
    normalized = (input) * k + h
    return normalized

The input here is a four-dimensional array [num,Channels,row,columns], but the corresponding k and h are one-dimensional constants, and both the gamma and beta parameters are Consistent, but in lasagne's official source code, gamma, beta can be substituted into the calculation. However k, h into the formula calculation will appear dimension mismatch problem, there is some confusion. What is the role of this batchnorm verification? thanks.

rzhao01 commented 6 years ago

I don't quite understand your question - you'll have to explain what "data participation" means and what dimensions are mismatched.

Batch norm parameters are supposed to be 1-dimensional, each output feature map has a k and h, so k and h should be arrays with length equal to the output channels.

I suggested verifying the batch norm because you mentioned testing on the board - I wanted to make sure you tested the parameters in the Python script before importing them to FPGA.

SHuixo commented 6 years ago

@rzhao01 Thank you. I'm so sorry to puzzle you about the way the question is expressed. The part of the "data participation" means that the k,h parameter. Dimension mismatch is that when calculating input k + h, input is a 4-dimensional array, k,h is 1-dimensional, and input k has a problem of dimension mismatch when calculating directly. But after these two transformations

 k = self.k.dimshuffle(pattern)
 h = self.h.dimshuffle(pattern) 

After that, the problem of dimension matching is gone.

Thank you for your continued support and we would like to express our sincerely thanks to you.

Figure 1: our own training results 1 Figure 2: refers to your original data 2

Thanks.

rzhao01 commented 6 years ago

No problem. So you made sure k and h worked in Python? Are you still having an issue on the FPGA board?

SHuixo commented 6 years ago

Yes, the converted data works on both Python and FPGA board. For the C++ code part you provided, after make, there are two versions of the executable program, which are cpu and fpga. But if i want to generate a program that can be run directly on the PS side of the FPGA chip, directly on the corresponding arm, without hardware acceleration, what efforts should I make? Thanks.

rzhao01 commented 6 years ago

Good to hear!

You just need a software (C++) implementation of the BNN, so instead of marking top with the HLS pragmas in Accel.h, just write normal C++. In fact, the master branch has some envvar switches to run the dense layers in software: here

SHuixo commented 6 years ago

OK, you've been great help, thank you so much~

rzhao01 commented 6 years ago

Closing due to lack of activity. All questions seem to have been answered.