fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.29k stars 418 forks source link

Hello, after converting GRU network to IP core using hls4ml, the output of IP core is not always 0. Here is my conversion code. #1111

Open yulin3262 opened 2 weeks ago

yulin3262 commented 2 weeks ago

from pathlib import Path

import numpy as np import pytest import torch import torch.nn as nn

from hls4ml.converters import convert_from_pytorch_model from hls4ml.utils.config import config_from_pytorch_model from hls4ml.utils import plot_model import plotting import hls4ml

test_root_path = Path(file).parent

import os os.environ['PATH'] = '/tools/Xilinx/Vivado/2019.2/bin:' + os.environ['PATH']

class GRUNet(nn.Module): def init(self): super().init() self.rnn = nn.GRU(4, 2, num_layers=1, batch_first=True, bias=True)

def forward(self, x, h0):
    output, hnn = self.rnn(x, h0)
    return output

def test_gru(backend, io_type): model = GRUNet() model.eval()

X_input = torch.randn(1, 1, 4)
h0 = torch.zeros(1, 1, 2)

pytorch_prediction = model(torch.Tensor(X_input), torch.Tensor(h0)).detach().numpy()

config = config_from_pytorch_model(
    model, [(None, 1, 4), (None, 1, 2)], channels_last_conversion="off", transpose_outputs=False
)
config['Model']['ReuseFactor'] = 1
config['Model']['Precision'] = 'ap_fixed<16,6>'    
print("-----------------------------------")
print("Configuration")
plotting.print_dict(config)
print("-----------------------------------")
output_dir = str(test_root_path / f'hls4mlprj_pytorch_api_gru_{backend}_{io_type}')

#hls_model = convert_from_pytorch_model(model, hls_config=config, output_dir=output_dir, backend=backend, io_type=io_type, part='xcu250-figd2104-2L-e')
hls_model = convert_from_pytorch_model(model, hls_config=config, output_dir=output_dir, backend=backend, io_type=io_type,
                                       part='xc7z045ffg900-2',clock_period = 4)

plot_model(hls_model, show_shapes=True, show_precision=True, to_file=None)

hls_model.compile()

hls_prediction = np.reshape(hls_model.predict([X_input.detach().numpy(), h0.detach().numpy()]), (1, 1, 2))

print("*****************************************")
print(X_input,h0,pytorch_prediction,hls_prediction)
print("*****************************************")

np.testing.assert_allclose(hls_prediction, pytorch_prediction, rtol=0, atol=1e-1)

hls_model.build(csim=True, export=True)

hls4ml.report.read_vivado_report(output_dir)

if name == 'main': test_gru('Vivado', 'io_stream')

bo3z commented 2 weeks ago

Can you please add a few more details? Output from which is not zero: CSim (csim=True in hls4ml.build(...)) or the hls_prediction? Does the line np.testing.assert_allclose fail or something later?

yulin3262 commented 2 weeks ago

Thank you very much for your reply. In my project, both hls_prediction and csim can output the result, but when I integrated the IP core derived from hls_model.build(csim=True, export=True) into my FPGA project for simulation, the output of IP core was always 0. Below is my project file、log and generated IP core, I can't find anything wrong? The test_recurrent_pytorch.py file in this project is also our official routine. hls4mlprj_pytorch_api_gru_Vivado_io_stream.zip log.txt

I use hls4ml to generate the IP core of the fully connected network, and the IP core can be normally output when integrated into the FPGA project.Is there a successful case of hls4ml converting gru network and applying it to FPGA? Can you provide a reference case?

bo3z commented 2 weeks ago

Unfortunately, we don't provide examples of how to integrate IPs into larger application, as this is application-specific and can be done in many ways:

  1. You could stream the data from the host CPU straight to the hls4ml IP block. We have some support for this using the Accelerator Backends (Vivado, being stable and Vitis, being experimental in #991). But this approach really wasn't meant for benchmarking or building end-to-end applications; instead it was meant for showing how data can be sent from the software side to the FPGA and getting the result back. You can check out Part 7 of the tutorial: https://github.com/fastmachinelearning/hls4ml-tutorial
  2. What I guess your objective is to have a full application running on the FPGA, in which hls4ml is one part of it. This is of course highly application-specific. You could follow Vitis tutorial and import the IP into Vivado: https://byu-cpe.github.io/ecen625/hls-integration-tutorial/ or you could copy the generated Verilog / VHDL source files into your larger project and connect the signals. Then you have to keep in mind the correct ordering of bits and what each hand-shake signal (ap_vld, ap_done etc.) means and how to connect them. I explained this in more details in some of the previous answers: https://github.com/fastmachinelearning/hls4ml/discussions/1059#discussioncomment-10550572 and https://github.com/fastmachinelearning/hls4ml/discussions/1081
yulin3262 commented 2 weeks ago

Thank you very much for your reply. I would like to know if your team has any successful cases of converting GRU network into IP core through hls4ml and integrating IP core into FPGA project. If you have a successful case? Then it is very likely that there is a problem in my FPGA project, I will look it up, thank you very much.

bo3z commented 2 weeks ago

Once you have the IP, it doesn't matter what the underlying architecture is (GRU or CNN or FC etc.). At that point it's a matter of properly connecting all the signals / IP blocks. I personally haven't worked with integrating GRU models (not to say others haven't), but recently I did integrate a FC network in a larger application, and most of my discoveries were explained in those 2 links for number 2) above.