Closed heejongkim closed 2 years ago
The error looks like a CUDA runtime problem. Are you sure your compilation process was fine? Are you using the right version of runtime compatible with your binary?
What happens if you re-compile 3.1.1?
I have been using the same binary since 3.1.2 release and it only started appearing since a week or two ago suddenly with same dependencies.
Should I recompile them (3.1.1 and 3.1.2 and 3.1.3) and test them out?
Any recommended CUDA version?
Should I recompile them (3.1.1 and 3.1.2 and 3.1.3) and test them out?
Yes, please.
Any recommended CUDA version?
Nothing in particular.
Recompiling 3.1.3 with lower version of gcc from 7.5.0 to 4.8.5 and corresponding openmpi version fixed the issue tentatively. (I suspect that manually compiled MPICH 3.4.1 with gcc 7.5.0 might've caused the issue)
Is there a page listing the dependencies/compilers and their versions that I've been missing?
Thank you.
I don't know why MPICH with GCC 7.5.0 does not work. We mostly work on GCC or Intel Compiler with OpenMPI. Other combinations should work but we ourselves cannot test or investigate them due to limited resources.
I will try a few combinations with different gcc with openMPI and MPICH.
I encountered this error recently and solved it by removing duplicate particles from the particles.star file after failing to solve it by installing all combinations of GCC and OpenMPI.
This is a template for reporting bugs. Please fill in as much information as you can.
Suddenly, with both 3.1.2 and 3.1.3, this error started showing up with relion_refine_mpi jobs, including initial model, 3d classification and so forth.
Environment:
Dataset:
Job options:
note.txt
in the job directory):Error message:
Please cite the full error message as the example below.