3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
444 stars 197 forks source link

Error with relion5 using 2D classification on aws g6 instances #1148

Open Cookiemaster33 opened 2 months ago

Cookiemaster33 commented 2 months ago

Hi there I am using relion5 running via SGE/qsub on aws clusters.

So far everything was running fine on g5 instances which use a NVIDIA A10G Tensor Core GPUs. We now switched to g6 instances which use NVIDIA L4 Tensor Core GPUs. During 2D classification we get the error: "failed to create cuffs plan".

Any idea what could be wrong?

Thanks and best

Toby

Environment:

Dataset:

Job options:

in: /relion/src/projector.cpp, line 362 ERROR: failed to create cufft plan === Backtrace === /opt/relion/bin/relion_refine(_ZN11RelionErrorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x77) [0x56106c48bbd7] /opt/relion/bin/relion_refine(_ZN9Projector26computeFourierTransformMapER13MultidimArrayIdES2_iibbiPKS1_b+0x36a3) [0x56106c52a8c3] /opt/relion/bin/relion_refine(_ZN7MlModel23setFourierTransformMapsEbidPK13MultidimArrayIdE+0x901) [0x56106c69d271] /opt/relion/bin/relion_refine(_ZN11MlOptimiser16expectationSetupEv+0x5a) [0x56106c4b16ea] /opt/relion/bin/relion_refine(_ZN11MlOptimiser11expectationEv+0x34) [0x56106c4e1824] /opt/relion/bin/relion_refine(_ZN11MlOptimiser7iterateEv+0x37a) [0x56106c4fd63a] /opt/relion/bin/relion_refine(main+0x51) [0x56106c476c91] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x14c1b6623bf7] /opt/relion/bin/relion_refine(_start+0x2a) [0x56106c47a5ea]

ERROR: failed to create cufft plan

biochem-fan commented 2 months ago

Which version of CUDA did you use to compile RELION? Is it compatible with "Ubuntu 18.04.5 LTS"? This is very very old OS and you shouldn't use it.

Did you specify CUDA_ARCH? (You shouldn't, if you want to share the binary with different GPUs).