Open jianghaizhu opened 4 years ago
Thanks for your bug report. Until we fix it, please run with very tiny sigma offset and sigma angles (e.g. 0.01), which are effectively zero.
Thanks! The workaround is valid.
I cannot reproduce your problem. Can you show me your body STAR file? Does this happen in the first iteration, or later?
Here is my body STAR file.
data_
loop_
_rlnBodyMaskName
_rlnBodyRotateRelativeTo
_rlnBodySigmaAngles
_rlnBodySigmaOffset
_rlnBodyReferenceName
Mask-and-Ref/mask/IC_lp15_mask.mrc 2 15 3 PostProcess/job374/postprocess.mrc
Mask-and-Ref/mask/TM_lp15_mask.mrc 1 0 0 PostProcess/job374/postprocess.mrc
I just tested another run at a different machine. It happened at the iteration 8. Here is the run.err.
3: MPI_ERR_TRUNCATE: message truncated
3: MPI_ERR_TRUNCATE: message truncated
in: /scratch/local/nasapps/relion/src/mpi.cpp, line 296
ERROR:
Encountered an MPI-related error, see above. Now exiting...
=== Backtrace ===
/mnt/nasapps/production/relion/3.1/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x4c) [0x44e0fc]
/mnt/nasapps/production/relion/3.1/bin/relion_refine_mpi(_ZN7MpiNode15relion_MPI_RecvEPvlP15ompi_datatype_tiiP19ompi_communicator_tR20ompi_status_public_t+0x2d2) [0x4ca2e2]
/mnt/nasapps/production/relion/3.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi22combineAllWeightedSumsEv+0x37c) [0x4953dc]
/mnt/nasapps/production/relion/3.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0x1ab) [0x4899cb]
/mnt/nasapps/production/relion/3.1/bin/relion_refine_mpi(main+0x7d) [0x43a26d]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f5ae0c65555]
/mnt/nasapps/production/relion/3.1/bin/relion_refine_mpi() [0x43a129]
==================
ERROR:
Encountered an MPI-related error, see above. Now exiting...
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Does Combine iterations through disc?: Yes
in the Compute
tab help?
When I turned Combine iterations through disc?: Yes
, the Multibody refinement didn't crash, but it won't stop. Right now, it is over 200 iterations. I remembered that it happened to me before if multibody refinement crashed, I can start the process again by Continue
. Sometimes I can repeat Continue
a couple of times until the iteration reached 999, then the process crashed.
Because I cannot reproduce your issue, I cannot help further. Recompiling with a newer version of OpenMPI might help.
Look at these lines in run.out
.
Auto-refine: Resolution
Auto-refine: Changes in angles
Auto-refine: Estimated accuracy angles=
Auto-refine: Angular step=
For convergence, resolution and changes in angles should stop improving and the angular step must be less than 75 % of the estimated accuracy angles. If this keeps fluctuating, you can stop the run and continue with --force_converge
.
First of all, running 2-body refinement with one body fixed is same as refinement with signal subtraction. There is no point using MultiBody refinement.
I agree that it is the same as signal subtraction. But multibody refinement seems to be easier to set up.
But computationally more demanding.
Describe your problem
Multi-body refinement with 2 bodies. The SigmaAngles and SigmaOffset were set to 0 for the smaller domain. If they were not set to 0, everything runs just fine.
Environment:
Dataset:
Job options:
note.txt
in the job directory):Error message:
Here is the end of run.out.
Here is the run.err.