fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.3k stars 419 forks source link

merge_template reuse_factor #1088

Closed sei-jgwohlbier closed 2 weeks ago

sei-jgwohlbier commented 1 month ago

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

Quick summary

C/RTL synthesis compilation error for missing CONFIG_T::reuse_factor.

Details

It appears that reuse_factor is missing from the vivado merge_config_template. Code diff for fix

git diff upstream/main
diff --git a/hls4ml/backends/vivado/passes/merge_templates.py b/hls4ml/backends/vivado/passes/merge_templates.py
index 078e004d..35aa5d36 100644
--- a/hls4ml/backends/vivado/passes/merge_templates.py
+++ b/hls4ml/backends/vivado/passes/merge_templates.py
@@ -6,6 +6,7 @@ from hls4ml.model.layers import Concatenate, Dot, Merge

 merge_config_template = """struct config{index} : nnet::merge_config {{
     static const unsigned n_elem = {n_elem};
+    static const unsigned reuse_factor = {reuse};
 }};\n"""

 merge_function_template = 'nnet::{merge}<{input1_t}, {input2_t}, {output_t}, {config}>({input1}, {input2}, {output});'

Steps to Reproduce

Add what needs to be done to reproduce the bug. Add commented code examples and make sure to include the original model files / code, and the commit hash you are working on.

  1. Clone the hls4ml repository
  2. Checkout the master branch, with commit hash: [352c124b9ae9b3611838c28902feb9db5b973497]

Running this test

from pathlib import Path

import numpy as np
import os
import shutil
import torch
import torch.nn as nn
from torchinfo import summary

from hls4ml.converters import convert_from_pytorch_model
from hls4ml.utils.config import config_from_pytorch_model

test_root_path = Path(__file__).parent

if __name__ == "__main__":

    class test(nn.Module):
        def __init__(self):
            super().__init__()

            self.downsample = nn.AvgPool1d(kernel_size=1, stride=2)

        def forward(self, x, y):
            d = self.downsample(x)
            p = torch.mul(d,y)
            return torch.cat((d, p), dim=-1)

    n_batch = 3
    n_in = 2
    size_in = 8
    X_input_shape = (n_batch, n_in, size_in)
    Y_input_shape = (n_batch, n_in, int(size_in/2))

    model = test()
    io_type='io_stream'
    backend='Vitis'
    output_dir = str(test_root_path / f'hls4mlprj_mul_{backend}_{io_type}')
    if os.path.exists(output_dir):
        print("delete project dir")
        shutil.rmtree(output_dir)

    model.eval()
    summary(model, input_size=[X_input_shape, Y_input_shape])

    X_input = np.random.rand(*X_input_shape)
    Y_input = np.full(Y_input_shape, 0.0) # constant tensor
    with torch.no_grad():
        pytorch_prediction = model(torch.Tensor(X_input),
                                   torch.Tensor(Y_input)).detach().numpy()

    # transform X_input to channels last
    X_input_hls = np.ascontiguousarray(X_input.transpose(0, 2, 1))
    Y_input_hls = np.ascontiguousarray(Y_input.transpose(0, 2, 1))

    # write tb data
    ipf = "./tb_input_features.dat"
    if os.path.isfile(ipf):
        os.remove(ipf)
    for x,y in zip(X_input_hls,Y_input_hls):
        np.savetxt(ipf, x.flatten(), newline=" ")
        np.savetxt(ipf, y.flatten(), newline=" ")
    opf = "./tb_output_predictions.dat"
    if os.path.isfile(opf):
        os.remove(opf)
    with open(opf, "ab") as f:
        for p in pytorch_prediction:
            np.savetxt(f, p.flatten(), newline=" ")

    config = config_from_pytorch_model(model,
                                       input_shape=[X_input_shape[-2:],
                                                    Y_input_shape[-2:]],
                                       backend=backend,
                                       default_precision='ap_fixed<16,6>',
                                       default_reuse_factor=1,
                                       channels_last_conversion='internal',
                                       transpose_outputs=False)
    config['Model']['Strategy'] = 'Resource'
    print(config)
    print(output_dir)

    hls_model = convert_from_pytorch_model(
        model,
        output_dir=output_dir,
        input_data_tb=ipf,
        output_data_tb=opf,
        backend=backend,
        hls_config=config,
        io_type=io_type,
        part='xcvu9p-flga2104-2-e'
    )
    hls_model.compile()

    print("pytorch_prediction")
    print(pytorch_prediction)
    print("pytorch_prediction.shape: ", end=" ")
    print(pytorch_prediction.shape)

    # reshape hls prediction to channels last, then transpose
    hls_prediction = hls_model.predict([X_input_hls,Y_input_hls])
    hls_prediction = np.transpose(
        np.reshape(hls_prediction,
                   (n_batch, size_in, n_in)),
        (0,2,1)
    )

    print("hls_prediction")
    print(hls_prediction)
    print("hls_prediction.shape: ", end=" ")
    print(hls_prediction.shape)

    rtol = 1.0e-2
    atol = 1.0e-2
    assert len(pytorch_prediction) == len(hls_prediction), "length mismatch"
    assert pytorch_prediction.shape == hls_prediction.shape, "shape mismatch"
    for p, h in zip(pytorch_prediction, hls_prediction):
        np.testing.assert_allclose(p,
                                   h,
                                   rtol=rtol, atol=atol)

    # synthesize
    hls_model.build(csim=True, synth=True, cosim=True, validation=True)

causes this error

***** C/RTL SYNTHESIS *****
INFO: [HLS 200-1510] Running: csynth_design 
INFO: [HLS 200-111] Finished File checks and directory preparation: CPU user time: 0.08 seconds. CPU system time: 0.01 seconds. Elapsed time: 0.09 seconds; current allocated memory: 261.133 MB.
INFO: [HLS 200-10] Analyzing design file 'firmware/myproject.cpp' ... 
WARNING: [HLS 207-5536] 'Resource pragma' is deprecated, use 'bind_op/bind_storage pragma' instead (firmware/nnet_utils/nnet_dense_resource.h:33:9)
WARNING: [HLS 207-5536] 'Resource pragma' is deprecated, use 'bind_op/bind_storage pragma' instead (firmware/nnet_utils/nnet_dense_resource.h:107:9)
WARNING: [HLS 207-5536] 'Resource pragma' is deprecated, use 'bind_op/bind_storage pragma' instead (firmware/nnet_utils/nnet_dense_resource.h:189:9)
ERROR: [HLS 207-2972] no member named 'reuse_factor' in 'config4' (firmware/nnet_utils/nnet_merge_stream.h:62:35)
INFO: [HLS 207-4235] in instantiation of function template specialization 'nnet::multiply<nnet::array<ap_fixed<16, 6, AP_TRN, AP_WRAP, 0>, 2>, nnet::array<ap_fixed<16, 6, AP_TRN, AP_WRAP, 0>, 2>, nnet::array<ap_fixed<16, 6, AP_TRN, AP_WRAP, 0>, 2>, config4>' requested here (firmware/myproject.cpp:40:8)
INFO: [HLS 200-111] Finished Command csynth_design CPU user time: 1.13 seconds. CPU system time: 0.24 seconds. Elapsed time: 1.38 seconds; current allocated memory: 2.309 MB.

    while executing
"source build_prj.tcl"
    ("uplevel" body line 1)
    invoked from within
"uplevel \#0 [list source $tclfile] "

INFO: [HLS 200-112] Total CPU user time: 8.58 seconds. Total CPU system time: 1.14 seconds. Total elapsed time: 9.54 seconds; peak allocated memory: 262.953 MB.
INFO: [Common 17-206] Exiting vitis_hls at Wed Oct 23 20:16:47 2024...
CSynthesis report not found.
Vivado synthesis report not found.
Cosim report not found.
Timing report not found.

Expected behavior

Success.

Actual behavior

Failure above.

Possible fix

Code change is on this fork. Can make a PR if desired, but it's a one liner so maybe it's easier for you to do it.

vloncar commented 2 weeks ago

Thanks, I ran into this problem just now. PR in #1121