UCL-RITS / pi_examples

A lot of ways to run the same way of calculating pi. Some of them are dumb.
Creative Commons Zero v1.0 Universal
28 stars 8 forks source link

Set `basesize` for threaded executor of Julia FLoops example #15

Closed giordano closed 2 years ago

giordano commented 2 years ago

I noticed that on Myriad the CPU FLoops example was ~10% slower than the other Julia threaded example:

[cceamgi@login12 julia_floops_pi_dir]$ JULIA_NUM_THREADS=18 julia --project pi_floops.jl 100000000000
  Warming up...done. [0.549s]

Calculating PI using:
  100000000000 slices
  18 thread(s)
Obtained value of PI: 3.141592653589794
Time taken: 5.472 seconds
[cceamgi@login13 julia_threads_dir]$ JULIA_NUM_THREADS=18 ./run.sh 100000000000
  Warming up...done. [0.153s]

Calculating PI using:
  100000000000 slices
  18 thread(s)
Obtained value of PI: 3.1415926535897793
Time taken: 4.871 seconds

but we can recover basically the same performance by tweaking the value of basesize of ThreadedEx. With this PR:

[cceamgi@login12 julia_floops_pi_dir]$ JULIA_NUM_THREADS=18 julia --project pi_floops.jl 100000000000
  Warming up...done. [0.577s]

Calculating PI using:
  100000000000 slices
  18 thread(s)
Obtained value of PI: 3.1415926535897922
Time taken: 4.882 seconds