Closed tohwsw closed 2 years ago
Please add clear steps on how to reproduce your performance measurements.
The flamegraphs do not carry useful information: there are very few functions with the same name on the profiles. linux-perf needs to only measure the compute process, and not the whole system. One of the flamegraphs shows more than 90% of the time in do_idle
.
Hi Sebastian, uploading the code here with a makefile included. The program is short so it takes about 2-3s to complete. quantasianoptionpricing.zip
Most of the time spent on c6g is in the __random
function.
49.21% pricing libc-2.31.so [.] __random
19.15% pricing pricing [.] gaussian_box_muller
14.11% pricing libm-2.31.so [.] exp@@GLIBC_2.29
gaussian_box_muller
has a loop that iterates a random number of times based on the output of rand():
do {
x = 2.0 * rand() / static_cast<double>(RAND_MAX)-1;
y = 2.0 * rand() / static_cast<double>(RAND_MAX)-1;
euclid_sq = x*x + y*y;
} while (euclid_sq >= 1.0);
The total amount of time reported by the program depends on the output of rand(). This is not something we should be looking at with linux-perf.
Hi, we were trying to do a benchmarking on Graviton (c6g.2xlarge) vs non-Graviton (c5.2xlarge) and it seems that calculations on Graviton is slower than it's non-Graviton counterpart. The setup:
Here are the flame graph of the two runs.
From the graphs it seems the function calc_path_spot_prices is taking more time in Graviton. So I had a look and realised the function is using exp in the calculations. Is the math library not optimized on ARM? How can we optimize the math routines?
Thanks for your help.