Severson-Group / AMDC-Firmware

Embedded system code (C and Verilog) which runs the AMDC Hardware
http://docs.amdc.dev/firmware
BSD 3-Clause "New" or "Revised" License
30 stars 5 forks source link

Profile math operations running in C code #395

Open npetersen2 opened 1 month ago

npetersen2 commented 1 month ago

Introduction

This issue documents @Known4225's AMDC platform onboarding project. This project will span C code firmware development, Python interface to AMDC, and creating a new page on the docs.amdc.dev website which summarizes your findings.

Goal: Quantify how long various math operation take to run on the AMDC real-time digital signal processor ("DSP"). Outcome: Report written in markdown and published on the docs website under GETTING STARTED / User Guide / Math Operations

Background

The AMDC is used for real-time control of motor drive systems: every X seconds, the AMDC samples various sensor input, performs some math on the sampled values, and then updates the PWM outputs based on the math. In the default firmware, the value of X is 100 microseconds, or a control rate of 10 kHz. For this all to work correctly, the firmware must compute the required math operations in a short time, i.e., much less than 100 us.

The AMDC uses a PicoZed system-on-module for its "brains". On this module, it has a AMD Xilinx Zynq-700 system-on-chip which is the main processor. This processor has dual core DSP and FPGA. The code which computes the math operations as described above runs on the DSP. The DSP is a standard ARM Cortex-A9 core. This is a relatively powerful processor.

We are interested in understanding how long various math operations take to complete on the Cortex-A9 processor. For example, sin(), sqrt(), /, etc. Your job is to create a framework to measure this, gather the data, and report it in a new docs web page.

Method

I envision this project using 3 core pieces of the AMDC system:

  1. Command handler (and optional state machine) which actually computes the math operation and records time stats
  2. Python scripts which run various tests on the AMDC to collect data and make plots
  3. Markdown file in the website which presents the findings

Command Handler

To collect the timing data from the AMDC, I recommend a system as follows:

First, come up with a full collection of supported math operations to profile. This should ideally be all supported standard math, i.e., from <math.h> header, for example, see here or here.

Then, write a new command handler which allows the user to run the math function and record how long it takes. This should have the following command signature:

math <num_ops> <func> <args>

where <num_ops> is an integer which tells the code how many times to evalaute the function and then returns the average run-time, <func> is the math function to use, and <args> is the arguments to the function.

Some examples:

math 50 sin 0 -- compute $sin(0)$ function 50 times and report the average run-time duration math 10 atan 10 -- compute $atan(10)$ function 10 times and report .... math 1 atan2 1 2 -- compute $atan2(1, 2)$ function 1 time and report .... math 100 sqrt rand -- compute $sqrt()$ function 100 times, each with a random input ...

To implement this generally as described will require a somewhat "complex" command handler, but shouldn't be too hard.

To keep track of the run-time, I recommend something like the following (with drv/cpu_timer)

uint32_t total_time_ticks = 0;

for (int i = 0; i < N; i++) {
  uint32_t t0 = cpu_timer_now();
  double out = cos(in);
  uint32_t t1 = cpu_timer_get_time();
  total_time_ticks += t1 - t0;
}

double total_time_us = cpu_timer_ticks_to_usec(total_time_ticks);
time_per_op_us = total_time_us / N;

You can also think about using the sys/statistics module to have more complete stats, like mean, max, min, std dev, etc.

A few notes:

Python Data Collection

Now that the AMDC firmware has the handler to measure the run-time, automate the data collection using the Python host interface and a Jupyter notebook.

For example, collect all data automatically as:

funcs_to_run = ["sin", "cos", "exp", "sqrt", "log", "pow", "floor"]

for func in funcs_to_run:
   # Run the test
   resp = amdc.cmd("math 20 %s rand" % func)
   print("Measured time:", resp[2])

   # Give AMDC a break between tests
   time.sleep(0.1)

Then, generate a plot of the findings, for example:

Website report

Follow the instructions on the docs.amdc.dev repo to set up the Sphinx build system to build the website locally. Then, add a new page for the report of this work. @codecubepi or @npetersen2 can give you support on getting the docs website build system up and running.

Make the report read as a self-contained document where it explains the purpose, background, test procedure, and gives the results.

Present results in graphs whenever possible, rendered with matplotlib directly from the jupyter notebook above. Include them as SVG files in the website (see other docs website pages for examples).

Bonus challenge: code acceleration

Using all your results, come up with a couple complicated and slow math operations which can be accelerated by using a different code implementation. I can help you with this once you have the results for each math operation.

For example, one complicated math operation is to compute the normalized 2D cross-product of two vectors to find the angle error between them. This involves normalization of the vector lengths to be 1 (but keeping the right angle), and then the actual cross product. This is quite slow and can probably be speed up by using only "fast" math operations.

Another example is a 2D vector rotation, for example, written in complex notation, out = in * exp(j * theta). This will end up implemented as cos/sin ops and multiply/accumulates. What is the fastest way to write the code to do this?