3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
453 stars 202 forks source link

unexpectedly small, yet non-zero sigma2 value #582

Closed jhansen6 closed 4 years ago

jhansen6 commented 4 years ago

running relion/3.1b0-cuda

hi, I get error below when running a 3d refinement.

The particles are from a brand new extraction of a previously identified subset of particles (see attached screenshot). I aligned my micrographs in relion. I used standalone gctf to estimate CTF and generate a micrographs.star file (see attached screenshot). However when I run a refinement I get this error at the end of the 1st iteration in what I think is the maximization step. Thanks!

I'm guessing the problem is in my extraction step, here it is: mpirun --np 40 relion_preprocess_mpi --i micrographs_ctf.star --reextract_data_star input_bundles4.star --part_star Extract/job001_512_bin2/particles.star --part_dir Extract/job001_512_bin2/ --extract --extract_size 512 --scale 256 --norm --bg_radius 96 --white_dust -1 --black_dust -1 --invert_contrast >&extract.log

and here is my refine script: mpirun --np 5 relion_refine_mpi --gpu 0:1:2:3 --o Refine3D/job001/run --auto_refine --split_random_halves --i Extract/job001_512_bin2/particles_beamtilt.star --ref inputmodels/cryosparcmodel_bin2_512.mrc --firstiter_cc --ini_high 40 --dont_combine_weights_via_disc --pool 3 --pad 2 --ctf --particle_diameter 200 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 3 --auto_local_healpix_order 5 --offset_range 5 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale --j 1 --pipeline_control Refine3D/job009/ >&refine.log

Screen Shot 2020-02-26 at 9 56 12 AM

Screen Shot 2020-02-26 at 9 56 03 AM

================== ERROR: BackProjector::reconstruct: ERROR: unexpectedly small, yet non-zero sigma2 value, this should not happen...a ERROR: cannot touch file: Refine3D/job009/RELION_JOB_EXIT_FAILURE

RELION version: 3.1-beta-commit-a6aaa5 exiting with an error ...

MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 13.

NOTE: invoking MPIABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. ng half-reconstructions up to 41.3538 Angstrom resolution to prevent diverging orientations ... Note that only for higher resolutions the FSC-values are according to the gold-standard! Calculating gold-standard FSC ... Maximization ... 000/??? sec ~~(,,"> [oo] DIRECT_A1D_ELEM(sigma2, i)= nan in: /tmp/relion/src/backprojector.cpp, line 1086 ERROR: BackProjector::reconstruct: ERROR: unexpectedly small, yet non-zero sigma2 value, this should not happen...a === Backtrace === relion_refine_mpi(_ZN11RelionErrorC2ERKSsS1_l+0x41) [0x44e431] relion_refine_mpi(_ZN13BackProjector16updateSSNRarraysEdR13MultidimArrayIdES2_S2_S2_RKS1_bb+0x14e2) [0x4de332] relion_refine_mpi(_ZN14MlOptimiserMpi12maximizationEv+0x11cb) [0x47c3ab] relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0x3f8) [0x47e338] relion_refine_mpi(main+0x5f) [0x43a31f] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3b4c41ed1d] relion_refine_mpi() [0x43ae39]

ERROR: BackProjector::reconstruct: ERROR: unexpectedly small, yet non-zero sigma2 value, this should not happen...a ERROR: cannot touch file: Refine3D/job009/RELION_JOB_EXIT_FAILURE

RELION version: 3.1-beta-commit-a6aaa5 exiting with an error ...

MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 13.

biochem-fan commented 4 years ago

Why you used GCTF outside RELION? This is not recommended. You metadata might not be correct.

jhansen6 commented 4 years ago

have been using gctf outside relion because it is always very challenging to get ctf estimation running in relion for anyone in our lab.

I just tried again and it keeps failing, here is my command and the output error.


Open MPI tried to fork a new process via the "execve" system call but failed. Open MPI checks many things before attempting to launch a child process, but nothing is perfect. This error may be indicative of another problem on the target host, or even something as silly as having specified a directory for your application. Your job will now abort.

mpirun --np 10 relion_run_ctffind_mpi --i MotionCorr/attempt5/micrographs/*.mrc --o CtfFind/job009/ --Box 512 --ResMin 30 --ResMax 5 --dFMin 5000 --dFMax 50000 --FStep 500 --dAst 100 --ctffind_exe /opt/applications/ctf/4.1.10/bin/ctffind4 --ctfWin -1 --is_ctffind4 --fast_search --use_given_ps --pipeline_control CtfFind/job009/ &>ctffind4.log

Local host: emnode2 Application name: /opt/applications/relion/3.1b1/gnu/bin/relion_run_ctffind_mpi Error: Argument list too long

[emnode2:19662] 9 more processes have sent help message help-orte-odls-default.txt / execve error [emnode2:19662] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

biochem-fan commented 4 years ago

Please use the GUI to generate the command line. --i should be the STAR file, not individual micrographs.

Please make sure you can process our tutorial dataset (beta-galactosidase). If it does not work as explained in the document, your installation is wrong, and you should fix installation first instead of resorting to tricky workarounds.

jhansen6 commented 4 years ago

thanks for the response. yes the solution was to point to a .star file rather than ../*mrc. Sorry, in my rush/frustration I made that careless mistake.

It is running now. Afterwards I will re-extract and retry to refinement to see whether my initial problem persists.

I am very appreciative of your help.

jhansen6 commented 4 years ago

hi, yes I went through the ctffind4 process to do things within relion, re-extract the particles and re-run the refine, and I still obtain the same error as I initially mentioned at the start of this thread. Any other suggestions for what might cause this?

thanks.

jhansen6 commented 4 years ago

tried googling it and I found a thread from 2016 saying it's a bug which was supposedly fixed. I checked some of the micrographs and they seem to be fine, I checked the location of the particle picks and they seem to be fine. It's strange.

biochem-fan commented 4 years ago

Please look at the particle STAR file after extraction. Paste the optics group table and few rows of data table. Especially be careful about the pixel size in the optics group table at the beginning. Also check the pixel size in the header of your reference and the mask. The header of MRC files did not matter in RELION 3.0 but it DOES MATTER in 3.1.

You said you are using RELION 3.1 but the STAR file you put in the first post looks like RELION 3.0.

jhansen6 commented 4 years ago

I was very hopeful that this was the solution. I checked and you were correct that there was a minor discrepancy due to rounding, I was unaware that relion 3.1 has this sensitivity. When doing CTF estimation I used pixel size 1.05 but my micrographs were aligned from super-resolution to yield 1.06 (unbind 0.53 rather than 0.525). So I re-imported the micrgroaphs in relion using the true pieel size of 1.06, re-ran ctf estimation, re-extracted (bin by 2 during extraction), and tried again with the refinement. Alas, the exact same error :(

My original particle stack was relion 2.0 actually, I was able to get this to extract and run in 2.0, but now in 3.1 it is nearly impossible for some reason. Perhaps part of the problem is using my particles file from a relion 2.0 reconstruction, I used relion star convert but perhaps something went wrong in the process. I wish I could just provide X/Y coords + micrograph name and do a fresh extract but it is impossible in relion 3.1. So many other things are required such as _rlnImageName.

thanks again, any thoughts would be appreciated.

header for a micrograph: Number of columns, rows, sections ..... 3710 3838 1 Map mode .............................. 2 (32-bit real) Start cols, rows, sects, grid x,y,z ... 0 0 0 3710 3838 1 Pixel spacing (Angstroms).............. 1.060 1.060 1.060 Cell angles ........................... 90.000 90.000 90.000 Fast, medium, slow axes ............... X Y Z Origin on x,y,z ....................... 0.000 0.000 0.000 Minimum density ....................... 14.664 Maximum density ....................... 17.922 Mean density .......................... 16.072 tilt angles (original,current) ........ 0.0 0.0 0.0 90.0 90.0 90.0 Space group,# extra bytes,idtype,lens . 0 0 5 0

`

version 30001

data_optics

loop_ _rlnOpticsGroupName #1 _rlnOpticsGroup #2 _rlnMicrographPixelSize #3 _rlnMicrographOriginalPixelSize #4 _rlnVoltage #5 _rlnSphericalAberration #6 _rlnAmplitudeContrast #7 opticsGroup1 1 1.060000 1.060000 300.000000 2.700000 0.100000

version 30001

data_micrographs

loop_ _rlnMicrographName #1 _rlnOpticsGroup #2 _rlnCtfImage #3 _rlnDefocusU #4 _rlnDefocusV #5 _rlnCtfAstigmatism #6 _rlnDefocusAngle #7 _rlnCtfFigureOfMerit #8 _rlnCtfMaxResolution #9 MotionCorr/attempt1/micrographs/18dec29b_grid1b_00034gr_00027sq_v02_00002hl_v01_00002en_frames.mrc 1 CtfFind/job009/MotionCorr/attempt1/micrographs/18dec29b_grid1b_00034gr_00027sq_v02_00002hl_v01_00002en_frames.ctf:mrc 14119.621094 13330.124023 789.497070 -85.98532 0.267457 3.170619 `

particles file: `# version 30001

data_optics

loop_ _rlnOpticsGroup #1 _rlnOpticsGroupName #2 _rlnImagePixelSize #3 _rlnImageSize #4 _rlnImageDimensionality #5 _rlnPixelSize #6 _rlnVoltage #7 _rlnSphericalAberration #8 1 opticsGroup1 1.060000 512 2 1.06 300 2.7

version 30001

data_particles

loop_ _rlnCoordinateX #1 _rlnCoordinateY #2 _rlnMicrographName #3 _rlnImageName #4 _rlnOpticsGroup #5 3043.232306 2209.830000 MotionCorr/attempt1/micrographs/19mar02a_grid1_00008gr_00068sq_v02_00002hl_v01_00002en_frames.mrc 000001@Extract/job002_nobin_512/MotionCorr/attempt1/micrographs/18dec29b_grid1b_00034gr_00027sq_v02_00002hl_v01_00003en_frames.mrcs 1 `

biochem-fan commented 4 years ago

I think you have completely messed up with your metadata. I am afraid to say that I cannot help anymore without looking at your project directory and history.

Probably you should start from scratch in RELION 3.1; this might take time, but you will also benefit from many new features like Bayesian Polishing (You cannot run Polish on movies motion corrected on RELION 2.x).

jhansen6 commented 4 years ago

hi. Thank you. Yes the entire project has already been started over in relion 3.1. My main issue is that I would like to recover the particles which I have already identified to be good. This was simpler in previous relion versions, but in 3.1 is nearly impossible -- as evidenced by the current error I am obtaining.

Is it possible to provide X Y coordinates to recover my particles from a relion 3.1 started from scratch? Or do I need to redo template picking etc. I have tried providing just X/Y coords and micrograph names but I get errors asking for pixel size, voltage, etc. (which should be in the micrographs and ctf estimation star file but okay I'll provide it anyway). Then it gives me an error asking for _rlnImageName. But why does it need this when I simply want to extract X/Y coordinates? Seriously any help would be appreciated because I am very happy with the list of particle coordinates. It would be really awesome to have a tool in relion where you can extract particles based simply off X/Y coords and then use the micrographs_ctf.star metadata for the rest of the information..........

biochem-fan commented 4 years ago

In this case, you should generate one STAR file per micrograph, mimicking the result of AutoPick. Then you can extract particles from CoordinateX/Y alone (remove all other columns, otherwise it might conflict with information in the micrograph STAR file).

You can use a tool like this: https://github.com/cdienem/StarTool#split-and-merge-operations

jhansen6 commented 4 years ago

fantastic thank you! I appreciate your help, sorry for so many questions.

jhansen6 commented 4 years ago

any insight yet into what causes this problem? I literally encounter it 1-2 times per week and it destroys my productivity. In relion 3.0 and 3.1.

For example I started a project in relion, processed my data to 2.5A (yay!) then now tried to re-extract on a different part of the protein to process again. Extraction went fine. However I run into the same non-zero sigma2 value again. Clearly nothing is wrong with the meta-data because I have been working with the particles.star file and micrographs.star file until this point with zero problems. Perhaps when I shifted the re-extraction center it caused the coordinate to move off the edge of the micrograph?

biochem-fan commented 4 years ago

Are you aware that if you re-extract with re-centering to a subunit, you have to shift the map and the mask as well? In other words, if you re-center to something other than (0, 0, 0), you should shift the map and the mask.

jhansen6 commented 4 years ago

yep I create a new starting model and mask which are on the new center. As a test I exported the extracted particles to cisTEM and cryosparc and both were able to run 3D refinements no problem. So confusing. Maybe it's something about how relion handles pixels off the edge?

biochem-fan commented 4 years ago

This is very difficult to investigate. If you can make a small subset (say 1000 particles) of your dataset that reproducibly leads to crash and share us privately, we will investigate further.

jhansen6 commented 4 years ago

Okay thanks. I literally have spent upwards of 100+ hours trying to find workarounds for this error. It takes up most of my days for weeks on end.