This issue documents @Known4225's AMDC platform onboarding project. This project will span C code firmware development, Python interface to AMDC, and creating a new page on the docs.amdc.dev website which summarizes your findings.
Goal: Quantify how long various math operation take to run on the AMDC real-time digital signal processor ("DSP").
Outcome: Report written in markdown and published on the docs website under GETTING STARTED / User Guide / Math Operations
Background
The AMDC is used for real-time control of motor drive systems: every X seconds, the AMDC samples various sensor input, performs some math on the sampled values, and then updates the PWM outputs based on the math. In the default firmware, the value of X is 100 microseconds, or a control rate of 10 kHz. For this all to work correctly, the firmware must compute the required math operations in a short time, i.e., much less than 100 us.
The AMDC uses a PicoZed system-on-module for its "brains". On this module, it has a AMD Xilinx Zynq-700 system-on-chip which is the main processor. This processor has dual core DSP and FPGA. The code which computes the math operations as described above runs on the DSP. The DSP is a standard ARM Cortex-A9 core. This is a relatively powerful processor.
We are interested in understanding how long various math operations take to complete on the Cortex-A9 processor. For example, sin(), sqrt(), /, etc. Your job is to create a framework to measure this, gather the data, and report it in a new docs web page.
Method
I envision this project using 3 core pieces of the AMDC system:
Command handler (and optional state machine) which actually computes the math operation and records time stats
Python scripts which run various tests on the AMDC to collect data and make plots
Markdown file in the website which presents the findings
Command Handler
To collect the timing data from the AMDC, I recommend a system as follows:
First, come up with a full collection of supported math operations to profile. This should ideally be all supported standard math, i.e., from <math.h> header, for example, see here or here.
Then, write a new command handler which allows the user to run the math function and record how long it takes. This should have the following command signature:
math <num_ops> <func> <args>
where <num_ops> is an integer which tells the code how many times to evalaute the function and then returns the average run-time, <func> is the math function to use, and <args> is the arguments to the function.
Some examples:
math 50 sin 0 -- compute $sin(0)$ function 50 times and report the average run-time duration
math 10 atan 10 -- compute $atan(10)$ function 10 times and report ....
math 1 atan2 1 2 -- compute $atan2(1, 2)$ function 1 time and report ....
math 100 sqrt rand -- compute $sqrt()$ function 100 times, each with a random input
...
To implement this generally as described will require a somewhat "complex" command handler, but shouldn't be too hard.
To keep track of the run-time, I recommend something like the following (with drv/cpu_timer)
You can also think about using the sys/statistics module to have more complete stats, like mean, max, min, std dev, etc.
A few notes:
Include test support for basic math: +-/ and *
Some math ops require 2 inputs arguments
Consider implementing this for different data types, doublefloat and int. The AMDC DSP natively supports double precision, so I do not think the time will be much faster for float vs double, but would be very interesting to find out.
You will probably need to make sure all input and outputs from the math are volatile type so that the compiler actually performs the math. Make sure to do a sanity check at some point to ensure things are working as expected and it is actually computing the right numbers
Make sure to limit the total ops to do per command handler to a "small" number, like 100 or something, to ensure the total test time is short enough. This is due to the cooperative scheduler on the AMDC
Python Data Collection
Now that the AMDC firmware has the handler to measure the run-time, automate the data collection using the Python host interface and a Jupyter notebook.
For example, collect all data automatically as:
funcs_to_run = ["sin", "cos", "exp", "sqrt", "log", "pow", "floor"]
for func in funcs_to_run:
# Run the test
resp = amdc.cmd("math 20 %s rand" % func)
print("Measured time:", resp[2])
# Give AMDC a break between tests
time.sleep(0.1)
Then, generate a plot of the findings, for example:
Website report
Follow the instructions on the docs.amdc.dev repo to set up the Sphinx build system to build the website locally. Then, add a new page for the report of this work. @codecubepi or @npetersen2 can give you support on getting the docs website build system up and running.
Make the report read as a self-contained document where it explains the purpose, background, test procedure, and gives the results.
Present results in graphs whenever possible, rendered with matplotlib directly from the jupyter notebook above. Include them as SVG files in the website (see other docs website pages for examples).
Bonus challenge: code acceleration
Using all your results, come up with a couple complicated and slow math operations which can be accelerated by using a different code implementation. I can help you with this once you have the results for each math operation.
For example, one complicated math operation is to compute the normalized 2D cross-product of two vectors to find the angle error between them. This involves normalization of the vector lengths to be 1 (but keeping the right angle), and then the actual cross product. This is quite slow and can probably be speed up by using only "fast" math operations.
Another example is a 2D vector rotation, for example, written in complex notation, out = in * exp(j * theta). This will end up implemented as cos/sin ops and multiply/accumulates. What is the fastest way to write the code to do this?
Introduction
This issue documents @Known4225's AMDC platform onboarding project. This project will span C code firmware development, Python interface to AMDC, and creating a new page on the docs.amdc.dev website which summarizes your findings.
Goal: Quantify how long various math operation take to run on the AMDC real-time digital signal processor ("DSP"). Outcome: Report written in markdown and published on the docs website under
GETTING STARTED
/User Guide
/Math Operations
Background
The AMDC is used for real-time control of motor drive systems: every X seconds, the AMDC samples various sensor input, performs some math on the sampled values, and then updates the PWM outputs based on the math. In the default firmware, the value of X is 100 microseconds, or a control rate of 10 kHz. For this all to work correctly, the firmware must compute the required math operations in a short time, i.e., much less than 100 us.
The AMDC uses a PicoZed system-on-module for its "brains". On this module, it has a AMD Xilinx Zynq-700 system-on-chip which is the main processor. This processor has dual core DSP and FPGA. The code which computes the math operations as described above runs on the DSP. The DSP is a standard ARM Cortex-A9 core. This is a relatively powerful processor.
We are interested in understanding how long various math operations take to complete on the Cortex-A9 processor. For example,
sin()
,sqrt()
,/
, etc. Your job is to create a framework to measure this, gather the data, and report it in a new docs web page.Method
I envision this project using 3 core pieces of the AMDC system:
Command Handler
To collect the timing data from the AMDC, I recommend a system as follows:
First, come up with a full collection of supported math operations to profile. This should ideally be all supported standard math, i.e., from
<math.h>
header, for example, see here or here.Then, write a new command handler which allows the user to run the math function and record how long it takes. This should have the following command signature:
math <num_ops> <func> <args>
where
<num_ops>
is an integer which tells the code how many times to evalaute the function and then returns the average run-time,<func>
is the math function to use, and<args>
is the arguments to the function.Some examples:
math 50 sin 0
-- compute $sin(0)$ function 50 times and report the average run-time durationmath 10 atan 10
-- compute $atan(10)$ function 10 times and report ....math 1 atan2 1 2
-- compute $atan2(1, 2)$ function 1 time and report ....math 100 sqrt rand
-- compute $sqrt()$ function 100 times, each with a random input ...To implement this generally as described will require a somewhat "complex" command handler, but shouldn't be too hard.
To keep track of the run-time, I recommend something like the following (with
drv/cpu_timer
)You can also think about using the
sys/statistics
module to have more complete stats, like mean, max, min, std dev, etc.A few notes:
+
-
/
and*
double
float
andint
. The AMDC DSP natively supportsdouble
precision, so I do not think the time will be much faster forfloat
vsdouble
, but would be very interesting to find out.volatile
type so that the compiler actually performs the math. Make sure to do a sanity check at some point to ensure things are working as expected and it is actually computing the right numbersPython Data Collection
Now that the AMDC firmware has the handler to measure the run-time, automate the data collection using the Python host interface and a Jupyter notebook.
For example, collect all data automatically as:
Then, generate a plot of the findings, for example:
Website report
Follow the instructions on the docs.amdc.dev repo to set up the Sphinx build system to build the website locally. Then, add a new page for the report of this work. @codecubepi or @npetersen2 can give you support on getting the docs website build system up and running.
Make the report read as a self-contained document where it explains the purpose, background, test procedure, and gives the results.
Present results in graphs whenever possible, rendered with
matplotlib
directly from the jupyter notebook above. Include them as SVG files in the website (see other docs website pages for examples).Bonus challenge: code acceleration
Using all your results, come up with a couple complicated and slow math operations which can be accelerated by using a different code implementation. I can help you with this once you have the results for each math operation.
For example, one complicated math operation is to compute the normalized 2D cross-product of two vectors to find the angle error between them. This involves normalization of the vector lengths to be 1 (but keeping the right angle), and then the actual cross product. This is quite slow and can probably be speed up by using only "fast" math operations.
Another example is a 2D vector rotation, for example, written in complex notation,
out = in * exp(j * theta)
. This will end up implemented ascos/sin
ops and multiply/accumulates. What is the fastest way to write the code to do this?