FPGA vs CPU Timing Analysis

abhipriyabansal commented 11 months ago

This PR addresses the difference in time taken by the FPGA and CPU to run a similar code and compute an arithmetic operation on REV E AMDC.

Changes to Overall Design

This PR adds an IP core called my_custom_adder that implements the adder on the FPGA and a user app to test the timing of adder in the FPGA versus the CPU (app file and command files added to run the operation and analyze timing).

Observations

A variety of tests were run using different number of operations and inputs. Their result can be seen below where each line represents time taken to compute the output for a set of inputs at multiple number of operations (for both FPGA and CPU) - line_graph_cpu line_graph_fpga

The FPGA takes approximately 150 - 400 ns to compute the output while CPU takes about 12 - 15 ns to compute the output.
The time taken for the FPGA to complete the arithmetic increases significantly as the number of operations increases. This could be possible if some operation might be running in the background in between arithmetic operations.
The time taken for each operation is dependent on the number of operations for both FPGA as well as CPU, it takes similar time for each operation to run with different inputs but having run same number of times.

Results

The main running code is :

            // Compute result using CPU
            out = 8*in1 + in2/4 - 10203;

            // Compute result using FPGA
            base_addr[0] = in1;
            dmb();
            base_addr[1] = in2;
            dmb();
            out = base_addr[2];
            dmb();

While the code ran by CPU would take more steps to get to the output since multiple registers would be involved, the code ran by FPGA takes longer due to 3 AXI transactions occurring (2 read operations and 1 write operation) to read and write data besides a single step of computation.
Looking at the specifications of AXI, it runs at a frequency of 200MHz (i.e. 5 ns) and takes about 9 clock cycles for read/write operation. Therefore, it takes approximately 45 ns for one AXI transaction. As observed through the graph, it takes about 150 ns for the FPGA to compute the output (for one operation) and knowing that there are 3 AXI transactions, the timing analysis looks accurate.

In conclusion, the FPGA doesn't work accurately as an accelerator in this case, specifically because it is quite a simple command for the C code to run faster. If either it were a complex operation or the AXI transaction could be faster, the FPGA would accelerate it.

npetersen2 commented 11 months ago

@abhipriyabansal Looking good! Please fix the formatting issues, then I will review. Thanks

npetersen2 commented 8 months ago

Closing this PR since it is a practice exercise for new contributor onboarding.

Severson-Group / AMDC-Firmware