3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
448 stars 200 forks source link

Problem with Bayesian Polishing : Incomplete third order polynomial values #955

Open subhrob15 opened 1 year ago

subhrob15 commented 1 year ago

Hey, Whenever I am trying to run the the polish part of bayesian polishing the job is exiting with an error saying incomplete third order polynomial values. I have attached a screenshot of the same. This is only happening when I am running it on the MPI procs but if I am running it locally this issue is not coming but then the whole job takes a lot of time. Could someone indicate how to fix this issue. Thanks polishing_error

biochem-fan commented 1 year ago

Please respect our issue template.

You didn't provide essential information (i.e. the version of RELION, full command line etc) and you used a screenshot only to show error messages in text.

subhrob15 commented 1 year ago

I am really sorry, I wasn't aware of this. Job options: Relion version: 3.1.3 and 4.0.0 Type of job: Bayesian Polishing Number of MPI processes: with 3, 5 or 6 gave the same issue Number of threads: 1

The command used: which relion_motion_refine_mpi --i run_data.star --f PostProcess/job001/postprocess.star --corr_mic corrected_micrographs.star --first_frame 1 --last_frame -1 --o Polish/job016/ --params_file Polish/job002/opt_params_all_groups.txt --combine_frames --bfac_minfreq 20 --bfac_maxfreq -1 --only_do_unfinished --j 12 --pipeline_control Polish/job016

This error message only originates if it is run on the mpi procs. For the following command the job works but again takes a lot of time. which relion_motion_refine --i Refine3D/job052/run_data_inverted.star --f PostProcess/job061/postprocess.star --corr_mic MotionCorr/job028/corrected_micrographs.star --first_frame 1 --last_frame -1 --o Polish/job154/ --float16 --params_file Polish/job150/opt_params_all_groups.txt --combine_frames --bfac_minfreq 20 --bfac_maxfreq -1 --only_do_unfinished --j 9 --pipeline_control Polish/job154/

Thanks

biochem-fan commented 1 year ago

if I am running it locally this issue is not coming

Does it go to completion albeit slower?

subhrob15 commented 1 year ago

Yes it does go to completion but it takes more than a week to complete, which is very very slow considering that there are only 1200 micrographs

biochem-fan commented 1 year ago

My initial guess was that one or more movies had corrupted motion STAR files. But that would kill non-MPI jobs as well. So this hypothesis is not correct.

Does this happen on all datasets you process on this machine? Was the dataset motion-corrected by RELION's implementation or UCSF MotionCor2?

subhrob15 commented 1 year ago

the dataset was motion corrected using RELION's implementation only. I forgot to mention, that the dataset was initial processed in cryosparc and converted to RELION format using cssparc2star.py. After doing that, I also performed a re-extraction job in RELION to see if the coordinates are correct and using the re-extracted particles I performed the polishing. This worked when I ran it locally and not on the cluster and the map quality also improved. But when I ran it on the MPI procs it gave this error.

biochem-fan commented 1 year ago

What happens if you Polish only one movie? Does it occur on any movie?

subhrob15 commented 1 year ago

Hey, I tried it with only one movie and it did work on the MPI proc

biochem-fan commented 1 year ago

Can you find the offending movie?

subhrob15 commented 1 year ago

Sorry I did not understand what you mean by offending

biochem-fan commented 1 year ago

Because one movie was fine, probably not all movies are bad. Please find which movie(s) cause crash.

subhrob15 commented 1 year ago

Okay, but could you suggest a better way to go through all the movies instead of going over one by one

biochem-fan commented 1 year ago

If there is only one problematic movie, you can use binary search. That is, split the dataset into half. If the first half is successful, the latter half contains the bad movie. Split the latter half into two and repeat the procedure.