3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
444 stars 197 forks source link

running RELION on slightly heterogeneous architectures #281

Closed ashkumatov closed 3 years ago

ashkumatov commented 7 years ago

Hi All

We bought 5 nodes about 1.5 years ago and extended our cluster by another 7 nodes this week. The new nodes are somewhat faster but in general comparable to "old" 5 nodes. I wonder if someone has experienced drop in performance on slightly heterogeneous architectures? My gut feeling is that it is not going to impact significantly, but i would appreciate any feedback on this. Thank you in advance, Alex

jmansour commented 7 years ago

Assuming all else equivalent (interconnects etc), I'm guessing jobs cast across mixed hardware will simply be limited to the speed of the slower hardware, as the work will be partitioned independently.

So processes running on faster hardware might spend some time (possibly negligible) waiting for processes running on slower hardware. The time waiting will be proportional to the speed difference.

But the devs might have more relion specific info to add.

bforsbe commented 7 years ago

During classification runs the difference is even less; there's images to be done and each slave simply asks for N*T images at a time ( --pool N --j T). Once those are done it asks for more. If slaves are on differnet hardware, the faster one will simply do a greater share of the work.

IF, however, threads of a single MPI-slave are on different hardware, what you presume is entirely correct; The slowest thread to finish will limit performance. This is because NT particles have to be done before another NT can be begun. If one thread is slow to complete its N images, then the other T-1 threads will hang around and waste computational power/time.

For refinements, the above applies with the added hedge that the data is kept in separate halves. This means that half of the slaves will do half of the data and not ever help with the rest. If one half has all the good resources (GPUs), then you will only run as fast as the slowest half.

jmansour commented 7 years ago

Ah right. I assumed (incorrectly) that each slave would handle a fixed batch of particles. Thanks for clarification.

ashkumatov commented 7 years ago

Thanks for explanation!