Closed annikaolson closed 8 months ago
Note: this PR has a merge conflict with hw/amdc_reve.bd
---the FPGA block design file. This file must be edited by hand in Vivado, so git cannot auto-merge.
I created this issue for @annikaolson since I merged #307 on top of her PR, which created the issue. Last week, we had this same issue, so I helped @annikaolson go through and fix this merge conflict with the bd
file. It is very painful since you have to do it by hand. All that said, since she has already gone through it, let's accept the merge conflict on this PR, since this PR is a test PR anyway (will not be merged).
Closing this PR since it is a practice exercise for new contributor onboarding.
The purpose of this PR is to compare the timing between the implementations of an adder in terms of a Verilog IP core and C code; made updates to the Rev E block design by adding an IP core and made a user app to test the adder in the FPGA versus the CPU.
C Code and Verilog Changes
Added an adder app/command to compute the average time per operation of the C code version of an addition function versus the Verilog implementation.
Created an "adder" folder to house the code. App files can be found here.
While there are no task files for this app, a new command was added to test the average time of an operation in the FPGA versus the CPU. Command files can be found here
Note that this is the code that was most modified in testing and was the primary point of analysis.
Here is the adder implemented in Verilog, performing the same operation as the C code.
A close-up of the addition operation being performed in this test:
The actual operation is the same, but the memory operations and sequential logic in the FPGA produce different results from the C operation.
Command Added
The arguments the command took were N (number of operations), and two inputs; there were previously no commands to test timing in the FPGA versus the C code:
adder test [cpu | fpga] <N> <in1> <in2>
: Complete the adding operation using user-provided inputs, then get the average time per operation.Results
Quantified Results
N was changed each time, and both inputs were kept consistent at 0 for each test. The command was run and the results show the average time per operation to compute the sum:
*There was an error in the console and the debugging session was suspended given this argument, so no data could be retrieved while N = 500 for the FPGA.
Code Analysis
The Verilog testing code had three operations in it:
Each operation took about the same amount of time, running for the same number of clock cycles. Each operation with the AXI in Verilog took many clock cycles; with a period of 5 nanoseconds, using total runtime of 396 nanoseconds, this estimates each operation to be approximately 26 clock cycles, or 132 nanoseconds.
Changing the volatile keyword on the inputs/output in the command changed the timing slightly; when none were defined as volatile, the code was optimized and did not fully compute the sum for N operations, which was concluded due to a time of ~18 nanoseconds for the Verilog code for N = 100 operations.
The C code operation is so simple, that it takes the multiplications of 8 and 4, recognizes them as shifts by 3 and 2, and completes that at the same time as the loading operation.
This code would act as an accelerator, then, if the C code took ~400 nanoseconds or longer, or in general using a more complex operation. However, if it is less, the Verilog code is slower.