Closed freaksie closed 1 month ago
@bo3z Thanks for the response. This clears up all the questions I had. We are implementing it on the Xilinx XCU216 Board, but I see that HLS4ML doesn't support Vivado version after 2020.1 and ZCU216 is a newer board that works best with the latest version of Vivado.
With the current version of Vivado i.e. (2018.3), there are many dependency errors while co-simulation. So are you planning to roll out a newer version of HLS4ML with additional dependencies?
Thank you again.
Questions 1-3 are solved. Thanks to @bo3z Questions 4 and 5 are remaining.
Thanks in advance
4-5 are significantly harder questions - in short, it's the HLS compiler.
Depending where the weights are stored (Latency or Resource strategy) the HLS compiler (and algorithm) will do different optimisations. From here, it seems you are using Latency strategy - weights are stored in registers and loops are unrolled while limiting the number of parallel products. In this case, the compiler could do several things - e.g. reduce the precision if the weight can be represented in less bits than needed; reorganise the order of multiplications to reduce the length of adder trees, merge weights into a single, larger word etc. In general, the compiler tends to optimise quite aggressively when loops are fully unrolled and parameters are in registers. While you can certainly spend some time to decode what weight maps to what word in the generated Verilog, it becomes impractical quite quickly (and it will change for different neural networks, weight values etc.). As long as the design has equal outputs between CSim (C++) and CoSim (RTL) simulation, you can assume the compiler did a good job. To check equality, use the validation
flag in build.
On the other hand, Resource strategy is a bit more predictable. It will transpose weights and stored them in Block RAM, the contents of which can be inspected in a predictable manner. Paper with explanation here: https://arxiv.org/abs/2308.05170
Oh, I understand. So the HLS compiler is responsible for weight representation. Thanks @bo3z these clears many things out. And that paper is really insightful.
Few more questions though. 1) According to HLS4ML input shape is 2<16,6> which is converted into 32'd. Now assume that my input is [-5.306,-1.689]. Assuming I have six bits representing signed integer value and 10 bits for fractional value. So 32bit can be 111101=-5; 0100110000=306; 111111=-1; 1010110001=689. this gives me a total of 32 bits as 11101101001100001111111010110001 and its equivalent integer is 3979411121.
Now I am using the COCOTB test bench. Is this the correct way to represent the signed input value?
2) The output I got from the network is of 16 bit i.e. 0000000000111001 and its integer equivalent is 57 which is not the correct output for the above-mentioned input. The correct output is 0.0527344 which also matched csim_result.log. So what am I doing wrong here?
3) How do I represent input like "[-5.0288, -3.002]" , "[-0.294, 0.201]"?
For reference network architecture and other details are in the main comment.
Let me know if more information is required.
Thanks in advance.
It doesn't really work like that. The numbers are represented as integer+fractional part, but for 5.306
the fractional part is not 306
rather 0.306
. You use 10 bits for that, giving you 1024 values between 0 and 1 (increment is ~0.000976562
). So instead of 0.306
you'll represent 0.306÷0.000976562=313.344...
. Thus you will represent 313
and lose a bit of precision (the number will be 5.305664
). 313
is 0100111001
in 10 bits in two's complement. Combining with with 5 which is 000101
in 6 bits you get the final number as 000101 0100111001
(without the space of course).
But this is only positive, how about negative? Let's take the -5.306
. To get integer+fractional you will need to do -6+0.694
. Using the formula above, 0.694
will be represented as 710
or 1011000110
. So finally the number is 111010 1011000110
.
In a similar way -1.689
is 111110 0100111110
.
If this is tricky math a shortcut that I do is to print out the values from the generated C++ testbench, since you already have them converted as ap_fixed types before calling the top function. You can access individual bits with []
operator. The least significant bit has index 0. For example:
ap_fixed<16,6> a = -1.689;
for (int i = a.width-1; i >= 0 ; i--) {
printf("%d", a[i].get());
}
prints out 1111100100111110
.
Thank you so much @vloncar for this. I was also confused regarding whether to represent it in two's complement or signed magnitude. but your answer has made that clear. And thanks for that code snippet too.
@bo3z
Is there a way to increase the size of LUT, I guess the default is 1024. I want to increase it to 4096. Moreover, I am using the softmax function which in turn is using LUT for inv_exp() and exp(). So can I increase the LUT for inv_exp() as well?
Thanks in advance.
Hi Neel,
This is doable with the table_size
config parameter but it should really be a last resort. It's rarely needed and there are many things you can do that would be a better choice. For example:
Check out the hls4ml-tutorial where all these things are showcased.
hello @vloncar
Thanks for the response.
Yes, I can use the argmax to get classification result, but this particular application requires probability distribution aswell.
Also I am thinking of implementing online learning in future, which will require the need for more precise LUT operations.
This table_size
, where can I pass this parameter?
hls_model = hls4ml.converters.convert_from_keras_model( model, hls_config=config, output_dir='../lbl',part='xc7vx485tffg17612',input_data_tb='../Data/xtest.npy',output_data_tb='../Data/ytest.npy',clock_period=2)
Here?
@vloncar
Deviation from the original output from the softmax layer
Orignal output : [[0.80757248, 0.1678711, 0.02455642]] VHDL output : [[1, 0.135742, 0.0185547]]
Let me know if you need more information. Also can you mention where can I use table_size
argument?
config = hls4ml.utils.config_from_keras_model(keras_model, granularity='name')
config['LayerName']['your_softmax_layer_name']['TableSize'] = whatever_you_want_but_remember_this_is_a_bad_idea
Prerequisites
Please make sure to check off these prerequisites before submitting a bug report.
Quick Summary
Trying to build a HLS project and during C Simulation it says "Unable to open input/predictions file, using default input". In "myproject_test.cpp" it is trying to find the .dat file in "tb_data/" which only consists of csim_result.log.
Now when it is choosing a random value for input why does it choose only one value? While input to my neural network model is of shape (2,).
Also when I went through the main verilog file i.e. "myproject.v" It takes only one [31:0] input_2_V.
Please Let me know if I am missing something.
I have attached the architecture of the neural network for a better understanding of the problem.
Inside the first hidden layer module of the network
(26'd406) is supposed to be my weight for node 4 and input feature 1. Now " $signed(r_V_10_0_4_fu_103_p1))" consists of the first input feature i.e. 16-bit decimal. but I don't find weight (26'd406) in the original w2.h file.
Below is w2.h file
Question Summary
1) Why is tb_data/ don't have an input file? 2) Why is it only taking one random value while the input layer has 2 inputs? 3) Why input to the main module is just one [31:0] instead of 2? 4) Why do weights not match with one in the model? 5) Why these weights are 26'd? according to the hls4ml config it should be <16,6>