Closed cvsindelar closed 4 years ago
Thanks for a test case. This is very useful for testing.
Unfortunately, 0001.mrcs
seems broken. Can you double-check the content of the archive?
My apologies! The current attachment should fix this.
omitting --zero_mask makes calls to CUB, which requires dynamic allocation of GPU memory. If relion is grabbing most of the memory to manage though its own allocator, then my guess is that there's not enough dynamic memory left. I recall that we increased the dynamic allocation space when non-zero-masking was moved to GPUs, but not the specifics.
With a 1080 you should have plenty, I'm just clarifying the context of this "out of memory" error. Hope that helps.
Hi Bjorn, I just heard back from our cluster administrator that he found a couple NVIDIA card models where the example ran OK- I will pass this along when I know. Certainly the 2MB test data set should not over tax the memory on the GTX 1080, at least I hope so! :)
No but that's sort of the point; relion doesn't know your input is 2MB, so it takes a big chunk of the GPU as a big static allocation, leaving some dynamic allocation space. If other programs or circumstance reduce this dynamic allocation space even further, you could still run out. There is a flag --free_gpu_memory which specifies a number of Mb extra dynamic allocation space. YOu could always try using that. --free_gpu_memory 1000 or so. I'm not saying you should have to, but it's a diagnostic at least.
Here is a list of which graphics cards successfully ran the test case above. I'll check to see whether that '--free_gpu_memory 1000' helps things.
K80: PASS RTX 2080: PASS
GTX 1080Ti: FAIL RTX 5000: FAIL RTX 8000: FAIL P100: FAIL V100: FAIL TITAN V: FAIL
Indeed, adding the option '--free_gpu_memory 1000' to the relion_refine command fixes the problem. This is a useable workaround. Not that I actually prefer the non-zero-masked method... it was just how I first thought to try it. Thanks Bjorn.
"Non-zero-masking" means masking by random noise, and generating random numbers on the GPU does require some extra space. Not sure how much, but clearly it can be come an issue. At the time we implemented this, we set parameters that we thought were conservative. The fact that it seems not to be is another argument for a more elaborate memory estimation that makes the static allocation less greedy. I'm really surprised that some cards work and others don't though...
Indeed. I often use non-zero masking (this improves the resolution of some membrane proteins by reducing overfitting) without problems. We use 1080Ti, 1080, 2080Ti.
Hi, on a new machine with Cuda11.1, GeForce RTX 3090 and relion3.1.1, I am having exact the same issue in Refine3D: if using non-zero masking, GPU run out of memory. The particle box size doesn't seem to make a difference. Not using non-zero masking or specifying '--free_gpu_memory 1000' can fix the issue.
Here is the error message:
Auto-refine: Resolution= 10.0206 (no gain for 0 iter)
Auto-refine: Changes in angles= 999 degrees; and in offsets= 999 Angstroms (no gain for 0 iter)
Estimating accuracies in the orientational assignment ...
0/ 0 sec ............................................................~~(,_,">
Auto-refine: Estimated accuracy angles= 0.3085 degrees; offsets= 0.60554 Angstroms
CurrentResolution= 10.0206 Angstroms, which requires orientationSampling of at least 1.65899 degrees for a particle of diameter 690 Angstroms
Oversampling= 0 NrHiddenVariableSamplingPoints= 2730
OrientationalSampling= 1.875 NrOrientations= 130
TranslationalSampling= 2.74 NrTranslations= 21
=============================
Oversampling= 1 NrHiddenVariableSamplingPoints= 87360
OrientationalSampling= 0.9375 NrOrientations= 1040
TranslationalSampling= 1.37 NrTranslations= 84
=============================
Expectation iteration 1
000/??? sec ~~(,_,"> [oo]KERNEL_ERROR: out of memory in /home/install/code/relion-3.1/src/acc/utilities_impl.h at line 253 (error-code 2)
KERNEL_ERROR: out of memory in /home/install/code/relion-3.1/src/acc/utilities_impl.h at line 253 (error-code 2)
KERNEL_ERROR: out of memory in /home/install/code/relion-3.1/src/acc/utilities_impl.h at line 253 (error-code 2)
KERNEL_ERROR: out of memory in /home/install/code/relion-3.1/src/acc/utilities_impl.h at line 253 (error-code 2)
KERNEL_ERROR: out of memory in /home/install/code/relion-3.1/src/acc/utilities_impl.h at line 253 (error-code 2)
RELION version: 3.1.1-commit-9f3bf1
exiting with an error ...
RELION version: 3.1.1-commit-9f3bf1
exiting with an error ...
KERNEL_ERROR: out of memory in /home/install/code/relion-3.1/src/acc/utilities_impl.h at line 253 (error-code 2)
RELION version: 3.1.1-commit-9f3bf1
exiting with an error ...
RELION version: 3.1.1-commit-9f3bf1
exiting with an error ...
Non-zero masking makes the probability distribution wider and requires more memory especially when particles are difficult to align.
Possible solutions:
--maxsig 5000
)Hi, thank you for the prompt reply! I was actually doing local angular search (0.9 degrees). Using --maxsig 5000 didn't fix the issue, still the same error.
This is the command I was using:
which relion_refine_mpi
--o Refine3D/job013/run --auto_refine --split_random_halves --i particles_reorder_fre2relion.star --ref Class3D/job005/run_it001_class001.mrc --ini_high 10 --dont_combine_weights_via_disc --scratch_dir /ssd --pool 3 --pad 1 --skip_gridding --ctf --ctf_corrected_ref --particle_diameter 690 --flatten_solvent --solvent_mask shapeMask_nx512_CP/mask3D_500x281.mrc --solvent_correct_fsc --oversampling 1 --healpix_order 5 --auto_local_healpix_order 5 --offset_range 5 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale --helix --helical_outer_diameter 500 --ignore_helical_symmetry --sigma_tilt 5 --sigma_psi 3.33333 --sigma_rot 0 --helical_keep_tilt_prior_fixed --j 3 --gpu "" --dont_check_norm --keep_scratch --reuse_scratch --pipeline_control Refine3D/job013/
What happens if you run Refine3D with zero-masking, stop it, and continue with non-zero masking from an intermediate iteration where the resolution is 4 A or so?
Let me try it. I forgot to mention that the exact same data/command runs perfect fine on another machine with GTX 1080Ti, Cuda10.1 and relion3.1.1 (same version), so this issue seems to be related to the hardware/CUDA. Trying relion3.1.1 compiled with Cuda10.1 on the new machine with RTX3090 gives a different error message:
in: /home/install/code/relion-3.1-cu10.1/src/acc/cuda/cuda_fft.h, line 224 ERROR:
When trying to plan one or more Fourier transforms, it was found that the available GPU memory was insufficient. Relion attempts to reduce the memory by segmenting the required number of transformations, but in this case not even a single transform could fit into memory. Either you are (1) performing very large transforms, or (2) the GPU had very little available memory.
(1) may occur during autopicking if the 'shrink' parameter was set to 1. The
recommended value is 0 (--shrink 0), which is argued in the RELION-2 paper (eLife).
This reduces memory requirements proportionally to the low-pass used.
(2) may occur if multiple processes were using the same GPU without being aware
of each other, or if there were too many such processes. Parallel execution of
relion binaries ending with _mpi ARE aware, but you may need to reduce the number
of mpi-ranks to equal the total number of GPUs. If you are running other instances
of GPU-accelerated programs (relion or other), these may be competing for space.
Relion currently reserves all available space during initialization and distributes
this space across all sub-processes using the available resources. This behaviour
can be escaped by the auxiliary flag --free_gpu_memory X [MB]. You can also go
further and force use of full dynamic runtime memory allocation, relion can be
built with the cmake -DCachedAlloc=OFF
in: /home/install/code/relion-3.1-cu10.1/src/acc/cuda/cuda_fft.h, line 224 ERROR: ERROR:
Didn't you specify CUDA_ARCH
in cmake
? 3090 and 1080 belong to different GPU architectures, so you either need PTX in your binary or compile for the specific compute capability of each card. See https://docs.nvidia.com/deploy/cuda-compatibility/index.html.
Yes, we used -DCUDA_ARCH=86
For 3080, you have to use CUDA >= 11.1. https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849
Compile with it and also make sure with ldd
that you are using the right runtime.
Here is the ldd result: Can you spot anything wrong?
zhangrui@sp3.wustl.edu:/usr/local/relion/bin$ ldd relion_refine_mpi linux-vdso.so.1 (0x00007fff04bbe000) libcufft.so.10 => /usr/local/cuda-11.1/lib64/libcufft.so.10 (0x00007ff7d3226000) libmpi.so.40 => /opt/openmpi/4.0.5/lib/libmpi.so.40 (0x00007ff7d30fc000) libtiff.so.5 => /lib/x86_64-linux-gnu/libtiff.so.5 (0x00007ff7d3060000) libfftw3.so.3 => /usr/local/relion/lib/libfftw3.so.3 (0x00007ff7d2eab000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff7d2e86000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff7d2e7b000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff7d2e75000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff7d2c94000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff7d2b45000) libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007ff7d2b03000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff7d2ae6000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff7d28f4000) /lib64/ld-linux-x86-64.so.2 (0x00007ff7e1cb8000) libopen-rte.so.40 => /opt/openmpi/4.0.5/lib/libopen-rte.so.40 (0x00007ff7d2838000) libopen-pal.so.40 => /opt/openmpi/4.0.5/lib/libopen-pal.so.40 (0x00007ff7d2780000) libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007ff7d272f000) libwebp.so.6 => /lib/x86_64-linux-gnu/libwebp.so.6 (0x00007ff7d24c6000) libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007ff7d241b000) liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007ff7d23f2000) libjbig.so.0 => /lib/x86_64-linux-gnu/libjbig.so.0 (0x00007ff7d21e4000) libjpeg.so.8 => /lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007ff7d215f000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff7d2143000) libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007ff7d2109000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007ff7d2104000) libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007ff7d20ff000) libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007ff7d20d2000) libltdl.so.7 => /lib/x86_64-linux-gnu/libltdl.so.7 (0x00007ff7d20c7000)
ldd
looks fine. Sorry, I have no idea. We don't have 3080s at hand so cannot investigate locally.
OK. No worries! I can use "--free_gpu_memory 1000" for now without any issue. Thanks!
I mentioned this to @arom4github, our collaborator in NVIDIA, to see if something changed in cuRAND.
@arom4github commented this:
rui--zhang said nothing about number of mpi ranks he used. Probably it would be enough for him to have one mpi rank per GPU.
(I assumed you were doing so, but just to make sure)
In CCPEM and Twitter, there are several reports that RELION is running fine with 30x0 cards. But I am not sure if they tried non-zero masking. Can you try non-zero masking on our tutorial dataset? Does it run fine?
@arom4github commented this:
rui--zhang said nothing about number of mpi ranks he used. Probably it would be enough for him to have one mpi rank per GPU.
(I assumed you were doing so, but just to make sure)
I tried one mpi rank per GPU, still the same error.
Hi, for what it's worth, we run consistently into this error when we try to omit zero-masking and use our GPUs of multiple flavors. This is irrespective of particle dimension or bin factor (including very highly binned data with tiny dimensions). Non-GPU and/or zero-masked runs all work fine, so I think this points towards a GPU bug, not a memory limitation. - Chuck
From: Rui Zhang notifications@github.com Sent: Friday, February 12, 2021 11:26 AM To: 3dem/relion relion@noreply.github.com Cc: Sindelar, Charles charles.sindelar@yale.edu; Author author@noreply.github.com Subject: Re: [3dem/relion] GPU memory error if --zero_mask not used (#619)
rui--zhang said nothing about number of mpi ranks he used. Probably it would be enough for him to have one mpi rank per GPU.
(I assumed you were doing so, but just to make sure)
I tried one mpi rank per GPU, still the same error.
- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F3dem%2Frelion%2Fissues%2F619%23issuecomment-778295806&data=04%7C01%7Ccharles.sindelar%40yale.edu%7C2fe9895b6fd04f12a87508d8cf72e6ba%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637487439705457458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wfhZ1OsxClpY7xBOh14D7I72GQLdqTS0M3x1ppO0kgg%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABJR7PZCY6IKOZKJHQHRIE3S6VJB5ANCNFSM4M23C4DQ&data=04%7C01%7Ccharles.sindelar%40yale.edu%7C2fe9895b6fd04f12a87508d8cf72e6ba%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637487439705467414%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Mly48C7iN8QarV7hlPoskmvbt9MIyyJ%2F1UGrgynmp%2B4%3D&reserved=0.
@cvsindelar
use our GPUs of multiple flavors
Which GPU?
On our system with 1080 Ti or 2080 TI, it works fine. As I wrote above, can you test non-zero masking on our beta-galactosidase tutorial dataset?
One more comment on non-zero-masking (why I care about it): When I use the divide-and-conquer strategy to calculate different pieces for a big structure, if using zero-masking, the grey level (and the noise level) of different pieces tend not to match each other, making the final stitched map looks bad.
I am against the use of composite/stitched/Frankenstein maps because interface is not well defined. It is fine to make a low resolution overview for Supplementary Figures but please don't refine atomic models against it.
the grey level (and the noise level) of different pieces
Is this run_class001.mrc or PostProcessed map? Also note that different resolutions and sharpening B factors lead to different grey level and background noise level.
Since backprojection is done with unmasked particles, I don't know why non-zero masking changes the output map.
I agree the interface is not well preserved. The composite map is mainly used for figure making and map deposition.
For our doublet microtubule dataset, the grey level (and the background noise) of run_class001.mrc do not perfectly match with each other if using zero-masking. https://www.sciencedirect.com/science/article/pii/S0092867419310815 (see Fig. S1, the whole structure was divided into 30 pieces)
Hi there, I discovered that, at least on our machines, Relion 3 and 3.1 (both) are entirely unable to run GPU-accelerated 3D classifications if the --zero_mask option is not used.
I attach a simple 2MB data set that reproducibly generates this problem on multiple different machines, with multiple different RELION builds. The command runs successfully if either (1) no gpu option is given or (2) --zero_mask option is used. relion_gpu_crash.zip
Please write a clear description of what the problem is.
Environment:
Dataset:
Job options:
note.txt
in the job directory):relion_refine --o ./output --i particles_mini.star --ref mt_reconstruct_20A_bin12.mrc --firstiter_cc --healpix_order 1 --j 1 --gpu ""
gpu-ids not specified, threads will automatically be mapped to devices (incrementally). Thread 0 mapped to device 0 Running CPU instructions in double precision. Estimating initial noise spectra 0/ 0 sec ............................................................~~(,_,"> CurrentResolution= 162.563 Angstroms, which requires orientationSampling of at least 25.7143 degrees for a particle of diameter 685.811 Angstroms Oversampling= 0 NrHiddenVariableSamplingPoints= 16704 OrientationalSampling= 30 NrOrientations= 576 TranslationalSampling= 2 NrTranslations= 29
Oversampling= 1 NrHiddenVariableSamplingPoints= 534528 OrientationalSampling= 15 NrOrientations= 4608 TranslationalSampling= 1 NrTranslations= 116
Expectation iteration 1 of 50 000/??? sec ~~(,_,"> [oo]KERNEL_ERROR: out of memory in /dev/shm/be59/build/RELION/3.0.8/fosscuda-2018b/relion-3.0.8/src/acc/utilities_impl.h at line 253 (error-code 2) in: /dev/shm/be59/build/RELION/3.0.8/fosscuda-2018b/relion-3.0.8/src/acc/cuda/cuda_settings.h, line 81 in: /dev/shm/be59/build/RELION/3.0.8/fosscuda-2018b/relion-3.0.8/src/acc/cuda/cuda_settings.h, line 81 === Backtrace === relion_refine(_ZN11RelionErrorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x66) [0x43d806] relion_refine(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xf3) [0x4a46f3] relion_refine(_Z11_threadMainPv+0x36) [0x4c36a6] /lib64/libpthread.so.0(+0x7dd5) [0x2aba23d00dd5] /lib64/libc.so.6(clone+0x6d) [0x2aba246b602d]
ERROR:
A GPU-function failed to execute.
If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you
If this occurred at the middle or end of a run, it might be that
If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues