UCL / STIR

Software for Tomographic Image Reconstruction
http://stir.sourceforge.net/
Other
115 stars 95 forks source link

multithreading not working properly #324

Open danieldeidda opened 5 years ago

danieldeidda commented 5 years ago

I run SPECT and PET reconstruction using MPI and openMP with the following problems:

openMP: seems to be much slower when I use more threads both with PET and SPECT data MPI: if I set "Enable distributed caching:=1" I get the following:

INFO: Computing sensitivity

INFO: Sending segment 0, view 0 to slave 1

INFO: Sending segment 0, view 20 to slave 2

INFO: Sending segment 0, view 40 to slave 3

WARNING: ProjDataFromStream::set_viewgram: error after seekp

ERROR: Slave 2: Storing viewgrams failed!

INFO: Sending segment 0, view 60 to slave 4

WARNING: ProjDataFromStream::set_viewgram: error after seekp

INFO: Sending segment 0, view 80 to slave 5

ERROR: Slave 3: Storing viewgrams failed!

if it is set to zero I achieve around a factor 4 acceleration with 10 threads. If I use more I do not get further acceleration (This also for SPECT and PET).

KrisThielemans commented 5 years ago

SPECT currently does not benefit from OpenMP. This would need work on making the SPECTUB code thread-safe.

I have heard (and seen) speed-ups with OPENMP for PET. Maybe not very dramatic, but a factor ~4 should work for with more than ~6 physical cores (hyper-threading doesn't really help much).

@danieldeidda can you give some more detail on your system (processors, cores etc, e.g. lscpu on linux)?

It is possible that a relatively recent change #142 to make IO thread-safe, e.g. https://github.com/UCL/STIR/blob/8612517cb683b9e3470fcad748b465d9da91ffea/src/buildblock/ProjDataFromStream.cxx#L148-L150 (as opposed to when we call it) has slowed it down.

In fact, we should now be able to remove some of the critical sections in distributable.cxx, as in https://github.com/UCL/STIR/blob/8612517cb683b9e3470fcad748b465d9da91ffea/src/recon_buildblock/distributable.cxx#L181-L183. anyone wants to try? (You need a lot of cores to do a decent check)

KrisThielemans commented 5 years ago

the above MPI errors are a bug.

danieldeidda commented 5 years ago

lscpu

this is the output : Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Stepping: 4 CPU MHz: 1000.013 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4800.00 Virtualisation: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 28160K NUMA node0 CPU(s): 0-39