Open vandenBergArthur opened 1 year ago
Dense on 3D input is supported. Behind the scenes, it will result in exactly the same code as in your 2nd alternative, i.e., pointwise Conv2D. The reason you don't see DSPs being used is because of the number of bits involved. With less than 9 bits, the compiler chooses not to allocate DSPs and instead performs the multiplication in LUTs.
Dense on 3D input is supported. Behind the scenes, it will result in exactly the same code as in your 2nd alternative, i.e., pointwise Conv2D. The reason you don't see DSPs being used is because of the number of bits involved. With less than 9 bits, the compiler chooses not to allocate DSPs and instead performs the multiplication in LUTs.
I understand that DSPs are not used when the number of bits is less than 9. But I don't understand how a change in reusefactor from 64 to 25 changes the resources from this:
To this:
Is it because 25 is an invalid reusefactor for the
Resource
strategy?
I am using 0.7.0 but on the main branch this error is still present:
In file included from firmware/myproject.cpp:4:
firmware/parameters.h:28:44: error: ‘Resource’ is not a member of ‘nnet’; did you mean ‘resource’?
28 | static const unsigned strategy = nnet::Resource;
| ^~~~~~~~
| resource
firmware/parameters.h:59:44: error: ‘Resource’ is not a member of ‘nnet’; did you mean ‘resource’?
59 | static const unsigned strategy = nnet::Resource;
| ^~~~~~~~
| resource
g++: error: myproject.o: No such file or directory
Anyway, thanks for the insights @vloncar
You can see in the report exactly what is the DSP used for. When using the resource strategy, the weights will be stored in BRAM and there will be accounting to access the right BRAM and fetch the right element. This arithmetic requires multiplication. Verify in the logs.
You can see in the report exactly what is the DSP used for. When using the resource strategy, the weights will be stored in BRAM and there will be accounting to access the right BRAM and fetch the right element. This arithmetic requires multiplication. Verify in the logs.
Sorry for the many questions. But in the paper (Fast convolutional neural networks on FPGAs with
hls4ml) I read, it is mentioned that The BRAM consumption does not depend on the reuse factor
. So I don't understand how changing RF drops the BRAM from 18 to 0.
Also, I've looked into the log, but I cannot find for what the DSP is exactly used. My apologies but i'm still novice in the FPGA field.
(Sorry for closing and opening this issue all the time, but it automatically closes with my comment and I don't know how I can change it)
The statement from the paper refers to that particular model, not in general. When using the resource strategy, the weights will be partitioned with an ARRAY_RESHAPE
pragma (read about it in the Xilinx docs) with a block factor that is ceil(n_in * n_out / reuse_factor)
(the n_in
and n_out
being number of input and output neurons in a fully-connected layer, or number of input features times the kernel size and the number of output filters for the convolutional layers). We don't enforce the resource allocated for these arrays, though it is assumed that it goes to BRAMs if sufficiently large. This is controlled by a heuristic inside the Xilinx compiler (a black box) and we chose to trust it rather than expose the setting to the user (though we may do that in the future). The heuristic may choose to implement everything in LUTs, hence you no longer see BRAMs allocated. If BRAMs are allocated, they will be affected by the reshape pragma, so some arithmetic to index into it will be required, especially if reuse factor is not a power-of-two. Check the report in detail to see exactly what is the DSP used for. You are only looking at the big summary, but there are per-layer reports. You can browse the via GUI if you run vivado_hls -p myproject_prj
(myproject_prj
will be generated in the output directory when you call build()
), or you can view the .rpt text files themselves that are buried in the myproject_prj
directory. Start with the top-level report and go deeper into the layers to see which resources correspond to them. You can also use the analysis view of the GUI to match the lines to the resources, though this is not exactly perfect matching.
If you used an invalid reuse factor, the tool should report it during conversion. A quick glance at the log in the shared notebook reveals you also miss timing. I would consider that a more important issue. You should experiment with what's causing that (play with reuse factor).
Oh, and the paper you quote refers to the older implementation that you aren't using but everything I said above is also valid for the current default implementation.
Hi, thank you so much for the insightful reply @vloncar ! However, by being creative with the supported Keras layers it seems like I will not be able to succesfully implement the whole model. So, we want to use the Extension API because this seems like our only option left. We have 2 files that perform the calculations; 1 in Python and 1 in C++. But when trying to implement the model using a Keras model, it seems that our model is too big to use the Vivado backend (based on my other issues I posted). Using the VivadoAccelerator backend resulted in a synthesized model (a part of the entire model). So my question is, can we use the VivadoAccelerator backend in the extension API? Because all the examples I saw (KLLoss and the example on the documentation page) use regular Vivado backend.
You can use extension API in VivadoAccelerator backend. If you say that your model doesn't work in Vivado but works in VivadoAccelerator you're probably doing something wrong since there should be no differences in generated model architecture and the HLS it uses.
TL;DR at the bottom
Hi all, As mentioned in my other post #747, I am trying to implement a graph convolution. So, I need a matrix multiplication A * B = C where A is my input tensor and B is an adjacency matrix. To realize this, I have created 2 alternatives that use supported Keras layers so that I am able to use hls4ml to deploy this model. (We are also trying to use the extension API to implement the whole model.)
Alternative 1
In the first alternative, I simply use Dense layers to mimic the matrix multiplication.
Where adj1 is a tensor that represents the adjacency matrix:
For a starting configuration, I used default precision & a RF = 64 (like in tutorial 7 where a model with dense layers is deployed to the
PYNQ-Z2
board:But when I build this model with
hls_model.build(csim=False, export=True)
I get a rather odd output:I compared these results with those of the untrained model from the tutorial, and my resource usage is suspiciously low.
Model from tutorial:
I have found out that the 3D-input to the Dense layer is probably the reason. I tested a similar model with a 2D input shape, and the resource usage seems more normal.
So, does hls4ml not support Dense layers with 3D inputs? In Keras itself it should be allowed:
Alternative 2
a = Input(shape=(1,nodes,in_channels), name = 'input_x') b = Conv2D(filters=out_channels, kernel_size=1, strides=1, padding='valid', data_format='channels_last', use_bias=True, name ='conv2d_1x1')(a) b = Reshape(target_shape=(nodes,out_channels), name = 'reshape1')(b) c = Permute((2,1), name = 'permute1')(b) c = Reshape(target_shape=(out_channels,nodes,1), name = 'reshape2')(c) d = Conv2D(filters=nodes, kernel_size=(1,nodes), strides=1, padding='valid', data_format='channels_last', use_bias=False, kernel_initializer=tf.keras.initializers.Constant(adj1), name = 'matmul')(c) model = Model(inputs=a, outputs=d)
In file included from firmware/myproject.cpp:4: firmware/parameters.h:28:44: error: ‘Resource’ is not a member of ‘nnet’; did you mean ‘resource’? 28 | static const unsigned strategy = nnet::Resource; | ^
~~~ | resource firmware/parameters.h:59:44: error: ‘Resource’ is not a member of ‘nnet’; did you mean ‘resource’? 59 | static const unsigned strategy = nnet::Resource; | ^~~~ | resource g++: error: myproject.o: No such file or directoryIn file included from firmware/myproject.cpp:4: firmware/parameters.h:28:44: error: ‘Resource’ is not a member of ‘nnet’; did you mean ‘resource’? 28 | static const unsigned strategy = nnet::Resource; | ^
~~~ | resource firmware/parameters.h:59:44: error: ‘Resource’ is not a member of ‘nnet’; did you mean ‘resource’? 59 | static const unsigned strategy = nnet::Resource; | ^~~~ | resource g++: error: myproject.o: No such file or directory