Open danieldeidda opened 5 years ago
SPECT currently does not benefit from OpenMP. This would need work on making the SPECTUB code thread-safe.
I have heard (and seen) speed-ups with OPENMP for PET. Maybe not very dramatic, but a factor ~4 should work for with more than ~6 physical cores (hyper-threading doesn't really help much).
@danieldeidda can you give some more detail on your system (processors, cores etc, e.g. lscpu
on linux)?
It is possible that a relatively recent change #142 to make IO thread-safe, e.g. https://github.com/UCL/STIR/blob/8612517cb683b9e3470fcad748b465d9da91ffea/src/buildblock/ProjDataFromStream.cxx#L148-L150 (as opposed to when we call it) has slowed it down.
In fact, we should now be able to remove some of the critical
sections in distributable.cxx
, as in https://github.com/UCL/STIR/blob/8612517cb683b9e3470fcad748b465d9da91ffea/src/recon_buildblock/distributable.cxx#L181-L183. anyone wants to try? (You need a lot of cores to do a decent check)
the above MPI errors are a bug.
lscpu
this is the output : Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Stepping: 4 CPU MHz: 1000.013 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4800.00 Virtualisation: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 28160K NUMA node0 CPU(s): 0-39
I run SPECT and PET reconstruction using MPI and openMP with the following problems:
openMP: seems to be much slower when I use more threads both with PET and SPECT data MPI: if I set "Enable distributed caching:=1" I get the following:
if it is set to zero I achieve around a factor 4 acceleration with 10 threads. If I use more I do not get further acceleration (This also for SPECT and PET).