Open TitechTraj opened 2 years ago
Hi @TitechTraj, I think this can be solved by block (or cyclic) partitioning the input so that each part of the of array is less than 65536 maximum bitwidth.
An example of how to do this is here: https://gist.github.com/jmduarte/1a602ac8d69352b1487e15ba49b44c83
We can consider adding an example of this sort to the documentation. What do you think @vloncar ?
P.S. @vloncar I tried initially to update the pragma with VivadoArrayVariableConverter
, but I think this line
https://github.com/fastmachinelearning/hls4ml/blob/59ed8249f4bbdb4b23ff0c6f0bfc976b44d3ac7e/hls4ml/backends/fpga/fpga_types.py#L200-L201 prevents it from working when you just update the pragma for the partitioning. Is that expected behavior or should we fix it?
cc: @rfforelli @rohahann
@jmduarte This will probably not work to help push larger models onto FPGAs and will inevitably cause other problems with scheduling. I don't think we should support this type of hacking at this moment.
I am trying to compile a GRU network and even though it is not officially supported yet, I managed to merge some branches and made it work for a small GRU example. I am trying to synthesize my much larger network, it has 2 stacked GRU cells and the input is 20x512 (using ap_fixed<16,6> precision). I got into some issues, like loop unrolling but I read that changing the strategy to Resource and changing the reuse factor would help. I bypassed that error by making the reuse factor 64. And now I am stuck with this error.
ERROR: [XFORM 203-133] Bitwidth of reshaped elements (163840 bits) exceeds the maximum bitwidth (65536 bits) for array 'in_local.V' (firmware/myproject_axi.cpp:16).
I don't have enough knowledge in HLS and I am a bit lost to figure out the problem. This seems like my data input size : 20x512x16= 163840 which is larger than the maximum bitwidth. Is the maximum bitwidth fixed for some reason? Should I find a way to make my input smaller? Or can this be solved differently? Thanks!