3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
453 stars 202 forks source link

issues with distributing job across GPUs #576

Closed kellogg-cryoem closed 4 years ago

kellogg-cryoem commented 4 years ago

I’ve encountered this very weird bug in RELION, my version is 3.0.5.

  1. If I specify GPU list (with more than 1 GPU) but threads = 1, then all processes get assigned to the first device:

/usr/local/bin/mpirun -np 5 which relion_refine_mpi --o Refine3D/job023/run --auto_refine --split_random_halves --i smalltest.star --ref cryosparc_P23_J87_005_volume_map.mrc --ini_high 5 --dont_combine_weights_via_disc --no_parallel_disc_io --pool 3 --pad 2 --ctf --ctf_corrected_ref --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale --pipeline_control Refine3D/job023/ --dont_check_norm --gpu 0,1,2,3

Slave 1 will distribute threads over devices 0 1 2 3 Thread 0 on slave 1 mapped to device 0 Slave 2 will distribute threads over devices 0 1 2 3 Thread 0 on slave 2 mapped to device 0 Slave 3 will distribute threads over devices 0 1 2 3 Thread 0 on slave 3 mapped to device 0 Slave 4 will distribute threads over devices 0 1 2 3 Thread 0 on slave 4 mapped to device 0 Device 0 on cbsukellogg.biohpc.cornell.edu is split between 4 slaves

  1. If I specify number of GPUs but no GPU list and threads = 1, then all processes are distributed evenly across devices

/usr/local/bin/mpirun -np 5 which relion_refine_mpi --o Refine3D/job023/run --auto_refine --split_random_halves --i smalltest.star --ref cryosparc_P23_J87_005_volume_map.mrc --ini_high 5 --dont_combine_weights_via_disc --no_parallel_disc_io --pool 3 --pad 2 --ctf --ctf_corrected_ref --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale --pipeline_control Refine3D/job023/ --dont_check_norm --gpu

GPU-ids not specified for this rank, threads will automatically be mapped to available devices. Thread 0 on slave 1 mapped to device 0 GPU-ids not specified for this rank, threads will automatically be mapped to available devices. Thread 0 on slave 2 mapped to device 2 GPU-ids not specified for this rank, threads will automatically be mapped to available devices. Thread 0 on slave 3 mapped to device 4 GPU-ids not specified for this rank, threads will automatically be mapped to available devices. Thread 0 on slave 4 mapped to device 6 ^C Running CPU instructions in double precision.

  1. If I specify number of GPUs and threads > 1, then all processes are distributed evenly across devices.

/usr/local/bin/mpirun -np 5 which relion_refine_mpi --o Refine3D/job023/run --auto_refine --split_random_hs --i smalltest.star --ref cryosparc_P23_J87_005_volume_map.mrc --ini_high 5 --dont_combine_weights_via_disc --no_parallel_disc_io --pool 3 --pad 2 --ctf --ctf_corrected_ref --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale --pipeline_control Refine3D/job023/ --dont_check_norm --gpu 0,1,2,3 --j 5

Slave 1 will distribute threads over devices 0 1 2 3 Thread 0 on slave 1 mapped to device 0 Thread 1 on slave 1 mapped to device 1 Thread 2 on slave 1 mapped to device 2 Thread 3 on slave 1 mapped to device 3 Thread 4 on slave 1 mapped to device 0 Slave 2 will distribute threads over devices 0 1 2 3 Thread 0 on slave 2 mapped to device 0 Thread 1 on slave 2 mapped to device 1 Thread 2 on slave 2 mapped to device 2 Thread 3 on slave 2 mapped to device 3 Thread 4 on slave 2 mapped to device 0 Slave 3 will distribute threads over devices 0 1 2 3 Thread 0 on slave 3 mapped to device 0 Thread 1 on slave 3 mapped to device 1 Thread 2 on slave 3 mapped to device 2 Thread 3 on slave 3 mapped to device 3 Thread 4 on slave 3 mapped to device 0 Slave 4 will distribute threads over devices 0 1 2 3 Thread 0 on slave 4 mapped to device 0 Thread 1 on slave 4 mapped to device 1 Thread 2 on slave 4 mapped to device 2 Thread 3 on slave 4 mapped to device 3 Thread 4 on slave 4 mapped to device 0

I can’t figure out if this bug has already been reported/addressed or not.

bforsbe commented 4 years ago

This is all working as expected. What did you expect in each case?

kellogg-cryoem commented 4 years ago

Maybe I didn't explain it clearly enough... In all cases the GPUs should be evenly distributed across the available devices. In the FIRST case, all GPUs get assigned to the first device (0) even though I specified that more than one GPU should be used (-np 5) and the GPU list of devices that should be used: (0 - 3). The first case is the one that violates expectations. The rest (2-3) are normal.

biochem-fan commented 4 years ago

Are you aware of the difference between comma and colon?

Provide a list of which GPUs (0,1,2,3, etc) to use. MPI-processes are separated by ':'. For example, to place one rank on device 0 and one rank on device 1, provide '0:1

kellogg-cryoem commented 4 years ago

-_- thank you Takanori. It works, my mistake.