Open krogala opened 11 months ago
This is new, and I noticed that it only started happening after a new Relion compilation.
You wrote "4.0.1-commit-db9717". Was this commit working fine before you recompiled your binary? Or were you using an earlier commit of RELION 4.0.1?
Also: does this happen on any Refine3D jobs, or on a particular job? In the latter case, does it happen if you continue from earlier iterations?
Thank you for your quick response, @biochem-fan!
Indeed, you pointed out the exact issue that's been happening -- regarding only specific jobs throwing this error. I have now spent some time doing extensive testing of this phenomenon, and it looks like all "regular" Relion Refine3D jobs run well (and continue properly from optimiser.star) on either: v4.0.0, v4.0.1, or v5.0 -- with and without Blush regularization.
The only instance where I'm currently seeing this HealpixSampling error is when working with particles imported from cryoSPARC (using pyem's -- csparc2star.py). Originally, these particles come from Relion (after polishing), and were then temporarily moved to cryoSPARC for some 3DVA work. I want to bring them back to Relion, and technically, Refine3D jobs with these particles run to completion -- but only if the whole run completes without interruption. The moment it crashes (due to VRAM etc) and the job is resumed from optimiser.star, I am getting the Healpix error. There is no difference whether I choose the latest optimiser.star or an earlier one.
Any suggestions about what to look for would be great! I will try examining individual columns to check whether some of these values are causing the problem. I tried replacing the entire _dataoptics table, but that didn't help.
Thank you very much for detailed investigation. Unfortunately, I have no idea, as I don't use CS at all. I suggest you to report this to the CCPEM mailing list. Others might be facing the same issue and have workarounds.
Alright! I found what the issue is. Rather unexpected, because it seems like it has nothing to do where the .star file came from. In this, case the cryosSPARC imported .star file checks out.
The problem is the --relax_sym
parameter. Essentially, when resuming any Refine3D job from optimiser.star
-- that was originally started with a --relax_sym C2
parameter, I get the following warning and crash.
WARNING: Option --relax_sym is not a valid RELION argument
XSIZE(pdf_direction)= 192 rot_angles.size()= 96
in: /home/groups/rogala/SOFTWARE/relion/v5.0/relion/src/healpix_sampling.cpp, line 2003
ERROR:
HealpixSampling::writeBildFileOrientationalDistribution XSIZE(pdf_direction) != rot_angles.size()!
No other parameter seems to trigger it. Also, when resuming the job with --relax_sym
parameter empty, the job crashes just as well.
This is true, as of version: 5.0-beta-0-commit-90d239
.
The optimiser.star
files are practically identical between the two "treatments" with/without the relax_sym
parameter specified.
However, for those sampling.star
files that have: _rlnHealpixOrder=2
, I can see the following difference:
relax_sym=C2
==> 192x angle combinations in the data_sampling_directions
tablerelax_sym=[]
==> 96x angle combinations in the data_sampling_directions
tableIs this the difference that the error is pointing to?
Dear Developers,
Any insights into how to fix this type of Refine3D crash would be most helpful!
Many thanks,
Kacper
ERROR DESCRIPTION: Any attempt to resume Refine3D from an optimiser.star file ends with the following error for me. This is new, and I noticed that it only started happening after a new Relion compilation.
HealpixSampling::writeBildFileOrientationalDistribution XSIZE(pdf_direction) != rot_angles.size
EXTRA INFO: Here are the modules that I'm using to compile (and run) Relion v4.0.1:
From CMakeCache.txt, the gcc compilers were as follows:
I even went ahead and compiled xpdf/4.04 with qt/5.9.1 -- just in case this was a PDF reader error of sorts -- no difference.
Below are the full run.out and run.err files.
run.err
run.out