UCL-RITS / pi_examples

A lot of ways to run the same way of calculating pi. Some of them are dumb.
Creative Commons Zero v1.0 Universal
28 stars 8 forks source link

Optimise Rust build #13

Closed giordano closed 2 years ago

giordano commented 2 years ago

Benchmarks on Myriad. Before PR:

[cceamgi@login13 rust_pi_dir]$ ./run.sh
rm -f pi
rustc pi.rs
Calculating PI using:
  100000000 slices
  1 threads
Obtained value of PI: 3.1415926535904264
Time Elapsed: 5.997680755 seconds

After PR:

[cceamgi@login13 rust_pi_dir]$ ./run.sh
rm -f pi
rustc -O pi.rs
Calculating PI using:
  100000000 slices
  1 threads
Obtained value of PI: 3.1415926535904264
Time Elapsed: 0.186200917 seconds

I didn't look at the code, but I'm not sure this is using a single thread:

[cceamgi@login13 rust_pi_dir]$ for threads in 2 3 6 9 12 18 36; do OMP_NUM_THREADS=${threads} ./pi; done
Calculating PI using:
  100000000 slices
  2 threads
Obtained value of PI: 3.14159265358991
Time Elapsed: 0.93205899 seconds
Calculating PI using:
  100000000 slices
  3 threads
Obtained value of PI: 3.14159265358957
Time Elapsed: 0.62696469 seconds
Calculating PI using:
  100000000 slices
  6 threads
Obtained value of PI: 3.1415926535896452
Time Elapsed: 0.33730942 seconds
Calculating PI using:
  100000000 slices
  9 threads
Obtained value of PI: 3.1415926535895653
Time Elapsed: 0.21291425 seconds
Calculating PI using:
  100000000 slices
  12 threads
Obtained value of PI: 3.141592653589828
Time Elapsed: 0.16279886 seconds
Calculating PI using:
  100000000 slices
  18 threads
Obtained value of PI: 3.141592653589857
Time Elapsed: 0.11736555 seconds
Calculating PI using:
  100000000 slices
  36 threads
Obtained value of PI: 3.141592653589821
Time Elapsed: 0.11877296 seconds

~0.18 seconds is close to performance of 12 threads, definitely way faster than 2 threads.

giordano commented 2 years ago

Ok, now that we have a much better performance I increased the default number of steps to 10^9 (to match the other compiled languages) and I finally get more reasonable results:

[cceamgi@login13 rust_pi_dir]$ for threads in 1 2 3 6 9 12 18 36; do OMP_NUM_THREADS=${threads} ./pi; done
Calculating PI using:
  1000000000 slices
  1 threads
Obtained value of PI: 3.1415926535899708
Time Elapsed: 1.859772146 seconds
Calculating PI using:
  1000000000 slices
  2 threads
Obtained value of PI: 3.141592653589901
Time Elapsed: 0.930759049 seconds
Calculating PI using:
  1000000000 slices
  3 threads
Obtained value of PI: 3.1415926535899623
Time Elapsed: 0.620628970 seconds
Calculating PI using:
  1000000000 slices
  6 threads
Obtained value of PI: 3.141592653589683
Time Elapsed: 0.310554742 seconds
Calculating PI using:
  1000000000 slices
  9 threads
Obtained value of PI: 3.141592653589656
Time Elapsed: 0.207469345 seconds
Calculating PI using:
  1000000000 slices
  12 threads
Obtained value of PI: 3.1415926535898593
Time Elapsed: 0.157268319 seconds
Calculating PI using:
  1000000000 slices
  18 threads
Obtained value of PI: 3.141592653589815
Time Elapsed: 0.105528349 seconds
Calculating PI using:
  1000000000 slices
  36 threads
Obtained value of PI: 3.1415926535898224
Time Elapsed: 0.59943666 seconds

I wonder if threading has some overhead that gets noticeable with few number of steps.