fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.23k stars 397 forks source link

Vivado HLS synthesis hanging #457

Open AlexMontgomerie opened 2 years ago

AlexMontgomerie commented 2 years ago

Hi, I've run into an issue where synthesis has been stuck on the same line for about 14 hours. I've uploaded the log for this run, as well as the configuration and network that I used.

    # load the configuration
    with open("lenet.json", "r") as f:
        config = json.load(f)

    # load the keras model
    model = keras.models.load_model("lenet.keras")

    # create the hls model
    hls_model = hls4ml.converters.convert_from_keras_model(model, hls_config=config,
            output_dir="outputs",  io_type="io_stream", fpga_part="xc7z020clg484-1")

    # # build the hls
    hls_model.build(csim=True, cosim=True)

    # get the reports
    hls4ml.report.read_vivado_report(args.output_path)

Does anyone know why it's stuck here? It was able to synthesise other Conv2d layers earlier on, however this one has an Unable to satisfy pipeline directive: Loop's control-flow is too complicated to be pipelined. warning. Perhaps this is why.

lenet.keras lenet.json vivado_hls.log

Vivado version is 2019.1 on Centos 7.9

constiDisch commented 2 years ago

Hi, do you have any news on this? We have the same problem. Vivado-hls is stuck synthesizing a conv-layer. Therefore, we are very limited in the convolutions. Below is an example that is not synthesized (stopped after 12h)

from tensorflow.keras.layers import Flatten, Input, Activation, MaxPooling2D
from tensorflow.keras.models import Model
import hls4ml
from qkeras import QConv2D, QActivation, QDense, quantized_bits, quantized_relu
import numpy as np

# Load model

n_classes = 10
bits = 8
filters_per_conv_layer = [12, 12, 16, 24, 24]
neurons_per_dense_layer = []

x = x_in = Input(shape=(32,32,3))
for i, f in enumerate(filters_per_conv_layer):
    x = QConv2D(int(f), kernel_size=(3, 3), strides=(1, 1), padding='same',
                kernel_quantizer=quantized_bits(bits, 0, alpha=1),
                bias_quantizer=quantized_bits(bits, 0, alpha=1),
                kernel_initializer='lecun_uniform', use_bias=True,
                name='conv_{}'.format(i))(x)
    x = QActivation(quantized_relu(bits), name='conv_act_%i' % i)(x)
    x = MaxPooling2D(pool_size=(2, 2), name='pool_{}'.format(i))(x)
x = Flatten()(x)

for i, n in enumerate(neurons_per_dense_layer):
    x = QDense(n,
                kernel_quantizer= quantized_bits(bits, 0, alpha=1),
                bias_quantizer=quantized_bits(bits, 0, alpha=1),
                kernel_initializer='lecun_uniform', name='dense_%i' % i, use_bias=True)(x)
    x = QActivation(quantized_relu(bits), name='dense_act_%i' % i)(x)
x = QDense(n_classes,
            kernel_quantizer=quantized_bits(bits, 0, alpha=1),
            bias_quantizer=quantized_bits(bits, 0, alpha=1),
            kernel_initializer='lecun_uniform', name='output_dense', use_bias=True)(x)
x_out = Activation('softmax', name='output_softmax')(x)
model = Model(inputs=[x_in], outputs=[x_out], name='qkeras')
model.summary()

test_out = model(np.zeros(shape=(2, 32, 32, 3)))
assert test_out.shape == (2,10)

# Configure rounding and saturation
hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = [
    layer.name for layer in model.layers]
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

# Do hls4ml config
hls_config = hls4ml.utils.config_from_keras_model(
    model, granularity='name')
hls_config['Model']['ReuseFactor'] = 128
hls_config['Model']['Precision'] = 'ap_fixed<16,6>'
hls_config['Model']['Strategy'] = "Resource"
for Layer in hls_config['LayerName'].keys():
    hls_config['LayerName'][Layer]['Strategy'] = "Resource"
    hls_config['LayerName'][Layer]['ReuseFactor'] = 128

cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType'] = 'io_stream'  # Must set if using CNNs!
cfg['HLSConfig'] = hls_config
cfg['KerasModel'] = model
cfg['XilinxPart'] = "xc7k410t"
cfg["Backend"] = 'Vivado'
cfg['ClockPeriod'] = 8
cfg['OutputDir'] = "/tmp/test2"

hls_model = hls4ml.converters.keras_to_hls(cfg)
hls_model.compile()

hls4ml.utils.plot_model(hls_model, show_shapes=True,
                        show_precision=True, to_file=f"/tmp/test2/model.png")

# Synthesise rtl code using hls
hls_model.build(csim=True, synth=True, vsynth=True, export=True)
vloncar commented 2 years ago

Try a shallower model and also try to get rid of the same padding as this results in padding layer being inserted. It will simplify the design and hopefully speed up the synthesis.

constiDisch commented 2 years ago

Thanks for the input regarding the padding. Sure, I could use a shallower model. Nevertheless in my use-case a deeper model would be beneficial and i think there should be enough ressources on the FPGA. Do you know the technical details, which leads to the requirement to use such shallow models? Perhaps I could work on improving this.

vloncar commented 2 years ago

Not sure, but the deeper the model is, the more tasks need to be scheduled in the dataflow region, so it becomes harder for the compiler to organize the fifo streams between all this. hls4ml builds a single IP from all layers, perhaps the approach of splitting that into multple IPs and connecting them would also be viable (either in Vivado, or in HLS with "RTL blackbox" functionality). This was experimented on before, having it as a feature is in my TODO list, but that list is quite long.

constiDisch commented 2 years ago

Thanks a lot. Would you mind sharing your initial notes and experiments? I could then try to have a detailed look and in case I come up with a good solution, I could try to create a PR.

vloncar commented 2 years ago

The separate IP was tried before in Aigean (code, paper) but this is based on a now old version of hls4ml, when we didn't have support for QKeras. I played a little with the RTL blackbox feature, but in standalone examples, not as part of the hls4ml conversion flow. I plan on playing with Vitis HLS soon and revisiting this feature, so feel free to check back later.