This project aims to design an 32-point FFT (Fast Fourier Transform) based DIT (decimation in time) Butterfly Algorithm with multiple clock domains and time-shared design.
The design procedure is aimed to be work for the both digital designs flow FPGA and ASIC.
The FPGA flow will start from writing the RTL to the bitstream generation on the Xilinx Vivado tool.
The ASIC flow will start from writing the RTL to GDS file generation on Synopsys EDA tools.
Transforms are generally used to convert a function from time domain to frequency domain without any loss of information, The Fast Fourier transforms (FFT) are the numerically efficient algorithms to compute the Discrete Fourier Transform (DFT).
FFT algorithms are based on the concept of divide and conquer approach. The FFT is used in so many applications as In the field of communications, the FFT is important because of its use in orthogonal frequency division multiplexing (OFDM) systems.
In this project it was implemented the FFT for a 32-points sequence with the help of Decimation In Time algorithm with radix-2.
Here is more details on the modules that used in the design.
The MAC unit consists of eight multipliers, four adders and four two’s complement sages. MAC unit Design which is named “butterfly2” in our design and has 8 inputs (real and imaginary of input 1and input 2, real and imaginary of twiddle 1 and twiddle 2) and 4 imaginary (real and imaginary of output 1and output 2).
The function of this stage is only to get the two’s complement of the fixed-point number, instead of making subtractions blocks.
the multiplier is designed for a fixed-point operation, the Idea of the multiplier is that the ordinary multiplier is worked for two positive fixed-point number, so it is needed to ensure that the two multiplicands are positive to give the correct answer, so if one of the inputs or both of them are negative fixed-point numbers, first it is needed to take the two’s compliment for them, then apply the multiplication and depending on the inputs decide the shape of the output which is two’s complement or a positive fixed-point value.
The design of the adder is very simple, it is need to add three numbers together, so by adding the two of them then take the result and add it with the third and put it into the register that clocked with 50 MHz
The final design of the mac block which is run on 50MHz.
Registers are necessary to save outputs after each stage, as we achieved the concept of the pipeline by putting 2 register files one at the input stage and the other on the output stage. Each register file is of size 32 𝑟𝑒𝑔𝑖𝑠𝑡𝑒𝑟𝑠 × 2 (𝑟𝑒𝑎𝑙 𝑜𝑟 𝑖𝑚𝑎𝑔) × 8 𝑏𝑖𝑡𝑠 = 32 × 2 × 8 registers and run on Clk 10MHz. as we can enter input and get output every 100ns.
Mux will be used in second approach, As we use Muxs to choose which inputs of the stage will enter to 16 MAC circuits to achieve time sharing concept to produce the output. As in the second approach we will use 16 MAC circuits only to cover all the 5 stages but in the first approach we use 16 MAC circuits in every stage. And in the time share implementation, the MUX will choose between 𝑀 = 5 buses, each of size 32 × 2 × 8, with counter selection that resets after reaching count #5 by control unit.
The hardware which used in this approach is 16 MAC’s, 2 register files, a MUX that chooses which inputs will be loaded this cycle and a Control unit to choose a selection lines of the Muxs. Each MAC unit operates at 50 MHz and the pipeline clock at 10 MHz, so we can operate 5 operations of the MAC until new inputs enter. As We can use time sharing concept to operate 16 MAC unit to serve 5 stages in the algorithm and hence 5 cycles of clk 50 MHz are needed to get an FFT output for a specific input.So, we could reduce the Utilization of the hardware.
As when inputs enter, it will pass to Reg1 of clk 10Mhz and then will choose of the mux and then will go through MAC units and the outputs of this MAC units will go through Muxes and it will choose according to which stage operate now and that will be repeated until reach to the final stage and the output will go through reg 2 to save it.
Cooley-Tukey DIT-FFT Algorithm was implement using MATLAB to compare the results with the built-if function fft as depicted form the following figures the result of the built-in function and the estimated values of the Algorithm for a random sample of data.
This design was Synthesized and implemented for Xilinx Zynq UltraScale+ (xazu3eg-sfva625-1-i) FPGA, and all the following results are met.