fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.25k stars 407 forks source link

QKeras bad predictions #437

Closed HenningCode closed 2 years ago

HenningCode commented 2 years ago

Hello Guys, with the fix from yesterday I tried the example "qkeras_mnist_dense" example model, but the predictions are pretty bad.

I then tried to create my own QKeras model and tried it for a short amount of time on the mnist dataset just to see if thats the also the case with my own model and the results were pretty bad again. Am I doing something wrong?

This is the code to test the example model: (for this I used the latest branch form the Github)

config = hls4ml.utils.fetch_example_model('qkeras_mnist_dense.json')
print_dict(config)
hls_model = hls4ml.converters.keras_to_hls(config)

(x_train, y_train), (x_test, y_test) = mnist.load_data()

RESHAPED = 28*28
NB_CLASSES = 10

x_train = x_train.astype("float32")
x_test = x_test.astype("float32")

x_train = x_train.reshape(x_train.shape[0],RESHAPED)
x_test = x_test.reshape(x_test.shape[0],RESHAPED)

#Tested with and without 
x_train /= 256 
x_test /= 256

y_train = to_categorical(y_train, NB_CLASSES)
y_test = to_categorical(y_test, NB_CLASSES)

hls_model.compile()
print('Ground truth\n',y_test[0:5])
print('HLS Predict:\n',hls_model.predict(x_test[0:5]))

This is the script I am using to test the QKeras Model. If I change the single QDense layer to a normal Dense layer the predictions of the HLS model are pretty close to the predictions of the keras model. Why is that the case? (For this I was using the pip installation)

def count_errors(x,y):
    error = 0
    for i in range(len(x)):
        if(not np.array_equal(x[i],y[i])):
            error += 1

    return error

print("============================================================================\n\n\n")

NB_EPOCH = 2
BATCH_SIZE = 64
VERBOSE = 1
NB_CLASSES = 10
OPTIMIZER = Adam(learning_rate=0.0001, decay=0.000025)
VALIDATION_SPLIT = 0.1
BUILDING = 0

(x_train, y_train), (x_test, y_test) = mnist.load_data()

RESHAPED = 784

x_test_orig = x_test

x_train = x_train.astype("float32")
x_test = x_test.astype("float32")

x_train = x_train.reshape(x_train.shape[0], RESHAPED)
x_test = x_test.reshape(x_test.shape[0], RESHAPED)

#Here only tested with, without the accuracy is bad even without QDense
x_train /= 256
x_test /= 256

print('Train shape: ', x_train.shape)
print('Test shape: ', x_test.shape)

y_train = to_categorical(y_train, NB_CLASSES)
y_test = to_categorical(y_test, NB_CLASSES)

x = x_in = Input((RESHAPED,), name="input")
#x = Dense(64,name="dense0")(x)
x = QDense(64,kernel_quantizer=quantized_bits(16,6),
        bias_quantizer=quantized_bits(16,6),name="dense0")(x)
x = Activation("relu", name="act0")(x)
x = Dense(NB_CLASSES,name="dense2")(x)
x = Activation("softmax", name="softmax")(x)

model = Model(inputs=[x_in], outputs=[x])
model.summary()
model.compile(
    loss="categorical_crossentropy", optimizer=OPTIMIZER, metrics=["accuracy"])

history = model.fit(
    x_train, y_train, batch_size=BATCH_SIZE,
    epochs=NB_EPOCH, initial_epoch=1, verbose=VERBOSE,
    validation_split=VALIDATION_SPLIT)

config = hls4ml.utils.config_from_keras_model(model,granularity='name')
config['Model']['Strategy'] = 'Resource'
print_dict(config)
hls_model = hls4ml.converters.convert_from_keras_model(model,
                                            hls_config=config,
                                            output_dir='../output/model_std/hls4ml_prj',
                                            fpga_part='xc7z020clg400-1')
_ = hls_model.compile()

TEST_CASES = 5

out_model = model.predict(x_test[0:TEST_CASES])
out_model_change = np.zeros_like(out_model)
out_model_change[np.arange(len(out_model)), out_model.argmax(1)] = 1

print("Output of Normal Model:\n", out_model)

out_hls = hls_model.predict(x_test[0:TEST_CASES])
out_hls_change = np.zeros_like(out_hls)
out_hls_change[np.arange(len(out_hls)), out_hls.argmax(1)] = 1

print("Output of HLS Model:\n", out_hls)

print('Error Normal: ', count_errors(out_model_change,y_test[0:TEST_CASES]))
print('Error HLS: ', count_errors(out_hls_change,y_test[0:TEST_CASES]))
ChiRuiChen commented 2 years ago

Hi, I had tested your code. It seems fine to me since the numbers are small. What are your outputs? image

HenningCode commented 2 years ago

Okay thats kinda weird.. These are my Results for the QDense Layer. Results

And these are Results if I comment out the QDense Layer and use the Dense layer. ResultsDense

ChiRuiChen commented 2 years ago

Actually, the quantized_bit you set has an extra bit for integer. QKeras doesn't count the sign-bit when we specify the number of bits, so your Qdense uses <16,7> instead of <16,6>. maybe it's the cause? try set the quantized_bit = (16,5,alpha=1)

HenningCode commented 2 years ago

Okay that seems to work, but it was more related to the alpha=1. The Bit size isn't a factor. Did your results come from the dense only model or why did you get good results?

But thanks already!!

ChiRuiChen commented 2 years ago

my results come from Qdense, all are the same as your source code. So did set the quantized_bit = (16,5,alpha=1) help?

HenningCode commented 2 years ago

Yes it did help! I was just wondering why it did work for you and not for me.

thesps commented 2 years ago

It seems like this might be solved?

I'd add that alpha = 1 often helps. alpha != 1 is allowed, but you'd expect to see an ApplyAlpha layer inserted after the Dense layer in the HLSModel. You should see it if you plot the model. It's needed to restore the weight scale factors to the layer output, but the precision it uses may need to be set manually.

When seeing disagreement between QKeras (and anything) and hls4ml, the first thing is usually to profile and trace the model to find which layer is the first to have a disagreement. In your case, I'd suspect it's the ApplyAlphas.

HenningCode commented 2 years ago

Not quite solved. Now I tried to use QActivations and if I use them, the results are as bad as shown above. Do I have to add something there too?

x = QActivation("quantized_relu(4,0)", name="act0_m")(x)

But I will try to dig into it with trace and profiling!

jmduarte commented 2 years ago

What branches/versions of hls4ml and QKeras are you using?

HenningCode commented 2 years ago

I used the pip installation of hls4ml and also tried it with the master branch. So it was Version 0.5.0 and 0.5.1. Qkeras was always installed like this git+https://github.com/google/qkeras.git#egg=qkeras

thesps commented 2 years ago

The issue with quantized_relu could be related to #441. Could you try out either the release-v0.6.0 branch or master + #441?

HenningCode commented 2 years ago

I tried it with release 0.6.0 and did not improve the outcome.

HenningCode commented 2 years ago

Now I tried it with PR #441 and I got no improvement. I used this command to install the PR pip install git+https://github.com/fastmachinelearning/hls4ml.git@refs/pull/441/merge

and this script to try this out:

import hls4ml
import tensorflow.keras.backend as K
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

import yaml

import matplotlib.pyplot as plt

from qkeras import *

def print_dict(d, indent=0):
    align = 20
    for key, value in d.items():
        print('  ' * indent + str(key), end='')
        if isinstance(value, dict):
            print()
            print_dict(value, indent+1)
        else:
            print(':' + ' ' * (20 - len(key) - 2 * indent) + str(value))

def count_errors(x, y):
    error = 0
    for i in range(len(x)):
        if(not np.array_equal(x[i], y[i])):
            error += 1

    return error

def yaml_save(config, filepath):
    with open(filepath, 'w') as outfile:
        yaml.dump(config, outfile, default_flow_style=False)

print("============================================================================\n\n\n")

model_dir = 'model/qkaras_pull.h5'
config_dir ='config/qkaras_pull_config.yml'
picture_dir = 'picture/qkaras_pull.png'

NB_EPOCH = 10
BATCH_SIZE = 64
VERBOSE = 1
NB_CLASSES = 10
OPTIMIZER = Adam(learning_rate=0.0001, decay=0.000025)
VALIDATION_SPLIT = 0.1

SAFE_CONFIG = 1
COVNET = 1

(x_train, y_train), (x_test, y_test) = mnist.load_data()

RESHAPED = 784

x_test_orig = x_test

x_train = x_train.astype("float32")
x_test = x_test.astype("float32")

if COVNET:
    x_train = x_train[..., np.newaxis]
    x_test = x_test[..., np.newaxis]
else:
    x_train = x_train.reshape(x_train.shape[0], RESHAPED)
    x_test = x_test.reshape(x_test.shape[0], RESHAPED)

# Here only tested with, without the accuracy is bad even without QDense
x_train /= 256
x_test /= 256

print('Train shape: ', x_train.shape)
print('Test shape: ', x_test.shape)

y_train = to_categorical(y_train, NB_CLASSES)
y_test = to_categorical(y_test, NB_CLASSES)

if COVNET:
    x = x_in = Input(
        x_train.shape[1:-1] + (1,), name="input")
    x = QConv2D(
        16, (2, 2), strides=(2, 2),
        kernel_quantizer=quantized_bits(4, 0, alpha=1),
        bias_quantizer=quantized_bits(4, 0, alpha=1),
        name="conv2d_0_m")(x)
    x = QActivation("quantized_relu(4,0)", name="act0_m")(x)
    x = QConv2D(
        32, (3, 3), strides=(2, 2),
        kernel_quantizer=quantized_bits(4, 0, alpha=1),
        bias_quantizer=quantized_bits(4, 0, alpha=1),
        name="conv2d_1_m")(x)
    x = QActivation("quantized_relu(4,0)", name="act1_m")(x)
    x = Flatten()(x)
    x = QDense(NB_CLASSES, kernel_quantizer=quantized_bits(4, 0, alpha=1),
               bias_quantizer=quantized_bits(4, 0, 1, alpha=1),
               name="dense")(x)
    x = Activation("softmax", name="softmax")(x)

else:
    x = x_in = Input((RESHAPED,), name="input")
    #x = Dense(64,name="dense0")(x)
    x = QDense(64, kernel_quantizer=quantized_bits(5, 0, alpha=1),
               bias_quantizer=quantized_bits(5, 0, alpha=1), name="dense0")(x)
    x = QBatchNormalization()(x)
    x = QActivation("quantized_relu(4,2)", name="act0")(x)
    x = QDense(NB_CLASSES, kernel_quantizer=quantized_bits(5, 0, alpha=1),
               bias_quantizer=quantized_bits(5, 0, alpha=1), name="dense2")(x)
    x = Activation("softmax", name="softmax")(x)

model = Model(inputs=[x_in], outputs=[x])
model.summary()
model.compile(
    loss="categorical_crossentropy", optimizer=OPTIMIZER, metrics=["accuracy"])

history = model.fit(
    x_train, y_train, batch_size=BATCH_SIZE,
    epochs=NB_EPOCH, initial_epoch=1, verbose=VERBOSE,
    validation_split=VALIDATION_SPLIT)

model.save(model_dir)

config = hls4ml.utils.config_from_keras_model(model, granularity='name')
config['Model']['Strategy'] = 'Resource'

if SAFE_CONFIG:
    yaml_save(config=config, filepath=config_dir)

print_dict(config)
hls_model = hls4ml.converters.convert_from_keras_model(model,
                                                       hls_config=config,
                                                       output_dir='../output/model_std/hls4ml_prj')
_ = hls_model.compile()

TEST_CASES = 1000

hls4ml.utils.plot_model(hls_model, show_shapes=True,
                        show_precision=True, to_file='pictures/batchdense_model.png')
plt.show()

out_model = model.predict(x_test[0:TEST_CASES])
out_model_change = np.zeros_like(out_model)
out_model_change[np.arange(len(out_model)), out_model.argmax(1)] = 1

out_hls = hls_model.predict(x_test[0:TEST_CASES])
out_hls_change = np.zeros_like(out_hls)
out_hls_change[np.arange(len(out_hls)), out_hls.argmax(1)] = 1

print('Error Normal: ', count_errors(out_model_change, y_test[0:TEST_CASES]))
print('Error HLS: ', count_errors(out_hls_change, y_test[0:TEST_CASES]))
thesps commented 2 years ago

Ah it could be fixed by: config['LayerName']['softmax']['Strategy'] = 'Stable'

HenningCode commented 2 years ago

I did some more digging. Something about the last Masterbranch is breaking Conv Layers again, because when I use the basic x = Activation("relu", name="act0_m")(x) activations my HLS model is not comparable to the Qkeras model.

I Tried this script with the QDense Layer and the QActivations set to (4,0) and it works. So I think the fix for the activations on the PR #441 seem to work, but it does not working in combination with Conv layers.

HenningCode commented 2 years ago

Here I gathered what works for me and what doesn"t.

Tests:

Dense and Quant(4,2): pip: works master: works PR441: works

Dense and Quant(4,0): pip: fails master: fails PR441: works

Conv and Quant(4,2): pip: works master: fails pr441: fails

Conv and Quant(4,0): pip: fails master: fails pr441: fails

thesps commented 2 years ago

Thanks for the detail. Another thing that occurs to me: is this all with Strategy: Resource and io_type = io_parallel?

Specifically: I think that combination is broken for Conv2D, but io_type = io_stream should work, and give better synthesis results in the end.

HenningCode commented 2 years ago

If io_type = io_parallel is the default. Yes it is.

I will retest again for io_type = io_stream

Edit: I test and for the #441 it seems to work for Conv and Quant(4,0)

config = hls4ml.utils.config_from_keras_model(model, granularity='name')
config['Model']['Strategy'] = 'Resource'
config['LayerName']['softmax']['Strategy'] = 'Stable'

print_dict(config)
hls_model = hls4ml.converters.convert_from_keras_model(model,
                                                       hls_config=config,
                                                       output_dir='../output/model_std/hls4ml_prj',
                                                       io_type='io_stream')

I am creating the config like this now. I got some Warnings: WARNING: Hls::stream 'layer12_cpy1' contains leftover data, which may result in RTL simulation hanging., but atleast I can use this now.

jmduarte commented 2 years ago

Should be solved with #448