fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.29k stars 418 forks source link

FPGA Output is Zero in CNN model with 8,512 parameters. #1048

Open zsrabbani opened 3 months ago

zsrabbani commented 3 months ago

I have a CNN model. I used the hls4ml and all file and bitfile generated completely. Now I used the deployment code to implement on FPGA(ZCU104), the prediction output of FPGA is always Zero.

Total params: 8512 (33.25 KB) Trainable params: 8344 (32.59 KB) Non-trainable params: 168 (672.00 Byte)

I will appreciate for helping me.

Here is the Model:

rf_in = Input(shape=(1024, 2), name = 'rf_input')

x = Conv1D(16, 7, activation=None, padding='same', use_bias=False)(rf_in) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides = 2, padding='same') (x)

x = Conv1D(16, 7, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides = 2, padding='same') (x)

x = Conv1D(16, 5, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Conv1D(16, 3, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Conv1D(8, 5, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Conv1D(8, 3, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Conv1D(4, 3, activation=None, padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling1D(2, strides=2, padding='same') (x)

x = Flatten()(x)

dense_1 = Dense(64, activation='relu', use_bias=False)(x) dropout_1 = Dropout(0.35)(dense_1) dense_2 = Dense(16, activation='relu', use_bias=False)(dropout_1) dropout_2 = Dropout(0.55)(dense_2) softmax = Dense(7, activation='softmax', use_bias=False)(dropout_2)

model = keras.Model(rf_in, softmax) opt = keras.optimizers.Adam(learning_rate=0.001) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=["accuracy"])

model.summary()

Here is the HLSML code:

image

Here s the deployment code:

image

GeorgeMentzos commented 3 months ago

I can confirm that I am also encountering similar behaviour where I am using the standard CNN from the hls4ml tutorial quantized at 6-bits and the prediction output I am getting is also random. I would like to add that this only occurs when I am using the Resource Strategy where I am observing considerable accuracy loss (from 84% to 18%) just by switching from Latency to Resource mode.

vloncar commented 3 months ago

Try to get a better understanding of how configuration of types works from the documentation, what are the effects of using fixed precision and quantization, and ultimately profile your application. See the tutorial, especially part 2 and 4.

zsrabbani commented 3 months ago

As you can see, I used the correct setup but did not get any results. Could you help me with it?

returnwellbeing commented 2 months ago

Hi, I've encountered same issue. I am using the example for extension api, kreverse. https://fastmachinelearning.org/hls4ml/advanced/extension.html# I used vivadoaccelerator and got the final hardware block. but when i deploy the hardware on pynq-z2 board, I got only zero-filled output.

nghielme commented 2 months ago

Hi, I would suggest to first of all check if hls_model.predict(x) (HLS model simulated on CPU) corresponds to model.predict(x)(Keras model); they should be at least close each other. If they are not, the problem can be related to accumulators datatype in the network. For that you can try using auto so that the size of the accumulator is inferred by the operations that use the accumulator. This jmitrevs:keras-config-auto can be helpful in using auto for properly handle accumulators datatype.

returnwellbeing commented 2 months ago

@nghielme Thanks for suggestion. I found that hls_model.predict(x) and model.predict(x) are different. your advice was a great help in finding the cause.

@zsrabbani In my case, there are some error when generating {OUTPUT_DIR}/firmware/myproject.cpp. There must are some function calls at the end of the myproject.cpp. please check yours.

void myproject(
    // Here are some inputs
) {

    // hls-fpga-machine-learning insert IO
    // Here are some pragmas

#ifndef __SYNTHESIS__
    static bool loaded_weights = false;
    if (!loaded_weights) {
        // hls-fpga-machine-learning insert load weights
        loaded_weights = true;
    }
#endif

    // ****************************************
    // NETWORK INSTANTIATION
    // ****************************************

    // hls-fpga-machine-learning insert layers
    // Some function calls should be here. if not, the outputs of hardware block are always ZERO
}
zsrabbani commented 2 months ago

@returnwellbeing As I check myproject.cpp, everything looks fine and I didn't get that comment.