3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
456 stars 203 forks source link

relion_run_motioncorr --use_own --only_do_unfinished runs on all files (not just unfinished) #1174

Closed davidheisenberg closed 3 months ago

davidheisenberg commented 3 months ago

Hello,

We recently ran a motion correction job using RELION version: 4.0.1-commit-e5c483. Everything works well, except that --only_do_unfinished does not seem to be working. The behavior is that movies that have already been output by motion correction start to be corrected again. The reason we used it was to do motion correction as the data is coming in - so the input movies.star file grows in size over time.

I replicated the behavior with a subset of 5 movies. I also replicated the behavior with RELION version: 4.0.0-commit-386694 and RELION version: 4.0-beta-2-commit-b5a555. Every time, all 5 movies are motion corrected again and the original output files are overwritten.

Perhaps noteworthy: --only_do_unfinished behaves as expected with CTF estimation. Indeed - only the files that do not have output files are listed at the beginning of the log file (with motion correction, all files are listed even if they have corresponding output files already).

Is it possible this is happening because the movies are .tif while the output files are .mrc?

Environment:

Dataset -11,440 K3 super-resolution movies (*.tif)

Job options:

!/bin/bash

SBATCH -p long

SBATCH -e mg_aln/all/run5.err

SBATCH -o mg_aln/all/run5.out

SBATCH --ntasks=5

SBATCH --nodes=1

SBATCH -w compute-6-10

SBATCH --cpus-per-task=4

SBATCH --mem-per-cpu=7g

mpiexec which relion_run_motioncorr_mpi --i Import/job001/movies.star --o mg_aln/all --first_frame_sum 1 --last_frame_sum -1 --use_own --j 4 --bin_factor 2 --bfactor 150 --dose_per_frame 1.05 --preexposure 0 --patch_x 6 --patch_y 5 --gainref /auto_nfs/eisenberg2/cindycheng/CNSI/20240808_ohioAD/movies/SuperRef_Ohio_AD_20240808_00005.mrc --gain_rot 0 --gain_flip 0 --dose_weighting --grouping_for_ps 4 --only_do_unfinished

biochem-fan commented 3 months ago

Could you please check this with the latest version of RELION 4.0.x or 5.0? If the problem persists, I will investigate.

biochem-fan commented 3 months ago

Please also make sure your network storage does not have a synchronization problem. On a badly configured system, new files created on a node take time to become visible on other nodes. Thus, the second execution of the continuation job does not see output files from the first execution of the job.

For example, you can put ls -lR mg_aln/all at the beginning of the job script to see if previous files are there.

davidheisenberg commented 3 months ago

Thank you for your reply!

I redid the motion correction with RELION version: 4.0.1-commit-ee4669 and RELION version: 5.0-beta-3-commit-b978db, and the behavior remained the same. I did this with a subset of 5 K3 movies in .tif format. I included the ls command before the relion command to check the existence of the output files, and the log file output is below (ignore the _shr.mrc files, they are binned images made for viewing micrographs in bulk. They do not get overwritten, but all other files do):

mg_aln/test_unfinished/movies/: total 446334 -rw-r--r-- 1 cindycheng dsegroup 6880 Aug 20 16:51 Ohio_AD_20240808_00901.log -rw-r--r-- 1 cindycheng dsegroup 94280704 Aug 20 16:51 Ohio_AD_20240808_00901.mrc -rw-r--r-- 1 cindycheng dsegroup 1049600 Aug 20 16:50 Ohio_AD_20240808_00901_PS.mrc -rw-r--r-- 1 cindycheng dsegroup 104461 Aug 20 16:51 Ohio_AD_20240808_00901_shifts.eps -rw-r--r-- 1 cindycheng dsegroup 1472704 Aug 18 13:42 Ohio_AD_20240808_00901_shr.mrc -rw-r--r-- 1 cindycheng dsegroup 81745 Aug 20 16:51 Ohio_AD_20240808_00901.star -rw-r--r-- 1 cindycheng dsegroup 5199 Aug 20 16:51 Ohio_AD_20240808_00902.log -rw-r--r-- 1 cindycheng dsegroup 94280704 Aug 20 16:51 Ohio_AD_20240808_00902.mrc -rw-r--r-- 1 cindycheng dsegroup 1049600 Aug 20 16:50 Ohio_AD_20240808_00902_PS.mrc -rw-r--r-- 1 cindycheng dsegroup 138132 Aug 20 16:51 Ohio_AD_20240808_00902_shifts.eps -rw-r--r-- 1 cindycheng dsegroup 1472704 Aug 18 13:42 Ohio_AD_20240808_00902_shr.mrc -rw-r--r-- 1 cindycheng dsegroup 103772 Aug 20 16:51 Ohio_AD_20240808_00902.star -rw-r--r-- 1 cindycheng dsegroup 6281 Aug 20 16:51 Ohio_AD_20240808_00903.log -rw-r--r-- 1 cindycheng dsegroup 94280704 Aug 20 16:51 Ohio_AD_20240808_00903.mrc -rw-r--r-- 1 cindycheng dsegroup 1049600 Aug 20 16:50 Ohio_AD_20240808_00903_PS.mrc -rw-r--r-- 1 cindycheng dsegroup 123703 Aug 20 16:51 Ohio_AD_20240808_00903_shifts.eps -rw-r--r-- 1 cindycheng dsegroup 1472704 Aug 18 13:42 Ohio_AD_20240808_00903_shr.mrc -rw-r--r-- 1 cindycheng dsegroup 97130 Aug 20 16:51 Ohio_AD_20240808_00903.star -rw-r--r-- 1 cindycheng dsegroup 5166 Aug 20 16:51 Ohio_AD_20240808_00904.log -rw-r--r-- 1 cindycheng dsegroup 94280704 Aug 20 16:51 Ohio_AD_20240808_00904.mrc -rw-r--r-- 1 cindycheng dsegroup 1049600 Aug 20 16:50 Ohio_AD_20240808_00904_PS.mrc -rw-r--r-- 1 cindycheng dsegroup 138190 Aug 20 16:51 Ohio_AD_20240808_00904_shifts.eps -rw-r--r-- 1 cindycheng dsegroup 1472704 Aug 18 13:42 Ohio_AD_20240808_00904_shr.mrc -rw-r--r-- 1 cindycheng dsegroup 102368 Aug 20 16:51 Ohio_AD_20240808_00904.star -rw-r--r-- 1 cindycheng dsegroup 6628 Aug 20 16:51 Ohio_AD_20240808_00905.log -rw-r--r-- 1 cindycheng dsegroup 94280704 Aug 20 16:51 Ohio_AD_20240808_00905.mrc -rw-r--r-- 1 cindycheng dsegroup 1049600 Aug 20 16:50 Ohio_AD_20240808_00905_PS.mrc -rw-r--r-- 1 cindycheng dsegroup 110101 Aug 20 16:51 Ohio_AD_20240808_00905_shifts.eps -rw-r--r-- 1 cindycheng dsegroup 1472704 Aug 18 13:42 Ohio_AD_20240808_00905_shr.mrc -rw-r--r-- 1 cindycheng dsegroup 90124 Aug 20 16:51 Ohio_AD_20240808_00905.star === RELION MPI setup ===

biochem-fan commented 3 months ago

Thanks for testing. I will investigate.

biochem-fan commented 3 months ago

I cannot reproduce your problem with/without MPI.

$ ~/prog/relion/build/bin/relion_run_motioncorr --i just1.star --o /dev/shm/mcor/ --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 12 --float16 --bin_factor 1 --bfactor 150 --dose_per_frame 1.277 --preexposure 0 --patch_x 4 --patch_y 4 --eer_grouping 32 --gainref RawMovies/gain.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 3 --only_do_unfinished
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
  * BZ2Movies/20170629_00021_frameImage.mrc.bz2
 Correcting beam-induced motions using our own implementation ...
  11/  11 sec ............................................................~~(,_,">
 Generating joint STAR file ... 
   0/   0 sec ............................................................~~(,_,">
 Written: /dev/shm/mcor/corrected_micrographs.star
 Now generating logfile.pdf ... 
 Done! Written: /dev/shm/mcor/logfile.pdf

$ ~/prog/relion/build/bin/relion_run_motioncorr --i just1.star --o /dev/shm/mcor/ --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 12 --float16 --bin_factor 1 --bfactor 150 --dose_per_frame 1.277 --preexposure 0 --patch_x 4 --patch_y 4 --eer_grouping 32 --gainref RawMovies/gain.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 3 --only_do_unfinished
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
 Correcting beam-induced motions using our own implementation ...
 Generating joint STAR file ... 
   0/   0 sec ............................................................~~(,_,">
 Written: /dev/shm/mcor/corrected_micrographs.star
 Now generating logfile.pdf ... 
 Did not find any of the expected EPS files to generate a PDF file
 + Will make an empty PDF-file in /dev/shm/mcor/batch.pdf
 Done! Written: /dev/shm/mcor/logfile.pdf

$ mpirun -np 1 ~/prog/relion/build/bin/relion_run_motioncorr_mpi --i just1.star --o /dev/shm/mcor/ --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 12 --float16 --bin_factor 1 --bfactor 150 --dose_per_frame 1.277 --preexposure 0 --patch_x 4 --patch_y 4 --eer_grouping 32 --gainref RawMovies/gain.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 3 --only_do_unfinished
 === RELION MPI setup ===
 + Number of MPI processes                 = 1
 + Leader      (0) runs on host            = embox3
 ==========================
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
 Correcting beam-induced motions using our own implementation ...
 Generating joint STAR file ... 
   0/   0 sec ............................................................~~(,_,">
 Written: /dev/shm/mcor/corrected_micrographs.star
 Now generating logfile.pdf ... 
 Did not find any of the expected EPS files to generate a PDF file
 + Will make an empty PDF-file in /dev/shm/mcor/batch.pdf
 Done! Written: /dev/shm/mcor/logfile.pdf
biochem-fan commented 3 months ago

To isolate the problem, can you test it on a local storage and/or without job submissions (e.g. on a local computer or within an interactive job)?

davidheisenberg commented 3 months ago

I reproduced the problem when logged into a cluster node interactively. I will try with local storage next.

cindycheng@compute-6-10:/eisenberg2/cindycheng/CNSI/20240808_ohioAD$ `which relion_run_motioncorr`  --i Import/test_unfinished/movies.star --o mg_aln/test_unfinished --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 4  --bin_factor 2 --bfactor 150 --dose_per_frame 1.05 --preexposure 0 --patch_x 6 --patch_y 5 --gainref /auto_nfs/eisenberg2/cindycheng/CNSI/20240808_ohioAD/movies/SuperRef_Ohio_AD_20240808_00005.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 4 --only_do_unfinished
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
  * movies/Ohio_AD_20240808_00901.tif
  * movies/Ohio_AD_20240808_00902.tif
  * movies/Ohio_AD_20240808_00903.tif
  * movies/Ohio_AD_20240808_00904.tif
  * movies/Ohio_AD_20240808_00905.tif
 Correcting beam-induced motions using our own implementation ...
4.48/4.48 min ............................................................~~(,_,">
 Generating joint STAR file ... 
   0/   0 sec ............................................................~~(,_,">
 Written: mg_aln/test_unfinished/corrected_micrographs.star
 Now generating logfile.pdf ... 
 Done! Written: mg_aln/test_unfinished/logfile.pdf
cindycheng@compute-6-10:/eisenberg2/cindycheng/CNSI/20240808_ohioAD$ `which relion_run_motioncorr`  --i Import/test_unfinished/movies.star --o mg_aln/test_unfinished --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 4  --bin_factor 2 --bfactor 150 --dose_per_frame 1.05 --preexposure 0 --patch_x 6 --patch_y 5 --gainref /auto_nfs/eisenberg2/cindycheng/CNSI/20240808_ohioAD/movies/SuperRef_Ohio_AD_20240808_00005.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 4 --only_do_unfinished
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
  * movies/Ohio_AD_20240808_00901.tif
  * movies/Ohio_AD_20240808_00902.tif
  * movies/Ohio_AD_20240808_00903.tif
  * movies/Ohio_AD_20240808_00904.tif
  * movies/Ohio_AD_20240808_00905.tif
 Correcting beam-induced motions using our own implementation ...
4.50/4.50 min ............................................................~~(,_,">
 Generating joint STAR file ... 
   0/   0 sec ............................................................~~(,_,">
 Written: mg_aln/test_unfinished/corrected_micrographs.star
 Now generating logfile.pdf ... 
 Done! Written: mg_aln/test_unfinished/logfile.pdf
davidheisenberg commented 3 months ago

I reproduced the problem on a local computer. Please let me know if you have any other suggestions.

davboyer@tesla:~/temp$ `which relion_run_motioncorr`  --i Import/test_unfinished/movies.star --o mg_aln/test_unfinished --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 4  --bin_factor 2 --bfactor 150 --dose_per_frame 1.05 --preexposure 0 --patch_x 6 --patch_y 5 --gainref /auto_nfs/eisenberg2/cindycheng/CNSI/20240808_ohioAD/movies/SuperRef_Ohio_AD_20240808_00005.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 4 --only_do_unfinished
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
  * movies/Ohio_AD_20240808_00901.tif
  * movies/Ohio_AD_20240808_00902.tif
  * movies/Ohio_AD_20240808_00903.tif
  * movies/Ohio_AD_20240808_00904.tif
  * movies/Ohio_AD_20240808_00905.tif
 Correcting beam-induced motions using our own implementation ...
3.32/3.32 min ............................................................~~(,_,">
 Generating joint STAR file ... 
   0/   0 sec ............................................................~~(,_,">
 Written: mg_aln/test_unfinished/corrected_micrographs.star
 Now generating logfile.pdf ... 
 Done! Written: mg_aln/test_unfinished/logfile.pdf
davboyer@tesla:~/temp$ `which relion_run_motioncorr`  --i Import/test_unfinished/movies.star --o mg_aln/test_unfinished --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 4  --bin_factor 2 --bfactor 150 --dose_per_frame 1.05 --preexposure 0 --patch_x 6 --patch_y 5 --gainref /auto_nfs/eisenberg2/cindycheng/CNSI/20240808_ohioAD/movies/SuperRef_Ohio_AD_20240808_00005.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 4 --only_do_unfinished
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
  * movies/Ohio_AD_20240808_00901.tif
  * movies/Ohio_AD_20240808_00902.tif
  * movies/Ohio_AD_20240808_00903.tif
  * movies/Ohio_AD_20240808_00904.tif
  * movies/Ohio_AD_20240808_00905.tif
 Correcting beam-induced motions using our own implementation ...
3.20/3.20 min ............................................................~~(,_,">
 Generating joint STAR file ... 
   0/   0 sec ............................................................~~(,_,">
 Written: mg_aln/test_unfinished/corrected_micrographs.star
 Now generating logfile.pdf ... 
 Done! Written: mg_aln/test_unfinished/logfile.pdf
biochem-fan commented 3 months ago

In src/motioncor_runner.cpp, there are the following lines:

                if (continue_old)
                {
                        FileName fn_avg = getOutputFileNames(fn_mic_given_all[imic]);
                        if (exists(fn_avg) && exists(fn_avg.withoutExtension() + ".star") &&
                            (grouping_for_ps <= 0 || exists(fn_avg.withoutExtension() + "_PS.mrc")))
                        {
                                process_this = false; // already done
                        }
                }

This checks whether a movie has been processed or not.

After the "FileName fn_avg = getOutputFileNames(fn_mic_given_all[imic]);" line, please add the following lines, recompile (rerun make) and repeat your test.

std::cout << fn_avg << " " << (fn_avg.withoutExtension() + ".star") << " " << (fn_avg.withoutExtension() + "_PS.mrc") << std::endl;
std::cout << exists(fn_avg) << " " << exists(fn_avg.withoutExtension() + ".star") << " " << exists(fn_avg.withoutExtension() + "_PS.mrc") << std::endl;

This will tell which condition is unmet.

davidheisenberg commented 3 months ago

Hello,

I understand the problem now. I did not put a "/" after the directory in my --o. Sorry it was a trivial mistake and thank you for your help.

davboyer@tesla:~/temp$ `which relion_run_motioncorr`  --i Import/test_unfinished/movies.star --o mg_aln/test_unfinished --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 4  --bin_factor 2 --bfactor 150 --dose_per_frame 1.05 --preexposure 0 --patch_x 6 --patch_y 5 --gainref /auto_nfs/eisenberg2/cindycheng/CNSI/20240808_ohioAD/movies/SuperRef_Ohio_AD_20240808_00005.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 4 --only_do_unfinished
mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00901.mrc mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00901.star mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00901_PS.mrc
0 0 0
mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00902.mrc mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00902.star mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00902_PS.mrc
0 0 0
mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00903.mrc mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00903.star mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00903_PS.mrc
0 0 0
mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00904.mrc mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00904.star mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00904_PS.mrc
0 0 0
mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00905.mrc mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00905.star mg_aln/test_unfinishedmovies/Ohio_AD_20240808_00905_PS.mrc
0 0 0
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
  * movies/Ohio_AD_20240808_00901.tif
  * movies/Ohio_AD_20240808_00902.tif
  * movies/Ohio_AD_20240808_00903.tif
  * movies/Ohio_AD_20240808_00904.tif
  * movies/Ohio_AD_20240808_00905.tif
 Correcting beam-induced motions using our own implementation ...
000/??? sec ~~(,_,">                                                          [oo]^C
davboyer@tesla:~/temp$ `which relion_run_motioncorr`  --i Import/test_unfinished/movies.star --o mg_aln/test_unfinished/ --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 4  --bin_factor 2 --bfactor 150 --dose_per_frame 1.05 --preexposure 0 --patch_x 6 --patch_y 5 --gainref /auto_nfs/eisenberg2/cindycheng/CNSI/20240808_ohioAD/movies/SuperRef_Ohio_AD_20240808_00005.mrc --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 4 --only_do_unfinished
mg_aln/test_unfinished/movies/Ohio_AD_20240808_00901.mrc mg_aln/test_unfinished/movies/Ohio_AD_20240808_00901.star mg_aln/test_unfinished/movies/Ohio_AD_20240808_00901_PS.mrc
1 1 1
mg_aln/test_unfinished/movies/Ohio_AD_20240808_00902.mrc mg_aln/test_unfinished/movies/Ohio_AD_20240808_00902.star mg_aln/test_unfinished/movies/Ohio_AD_20240808_00902_PS.mrc
1 1 1
mg_aln/test_unfinished/movies/Ohio_AD_20240808_00903.mrc mg_aln/test_unfinished/movies/Ohio_AD_20240808_00903.star mg_aln/test_unfinished/movies/Ohio_AD_20240808_00903_PS.mrc
1 1 1
mg_aln/test_unfinished/movies/Ohio_AD_20240808_00904.mrc mg_aln/test_unfinished/movies/Ohio_AD_20240808_00904.star mg_aln/test_unfinished/movies/Ohio_AD_20240808_00904_PS.mrc
1 1 1
mg_aln/test_unfinished/movies/Ohio_AD_20240808_00905.mrc mg_aln/test_unfinished/movies/Ohio_AD_20240808_00905.star mg_aln/test_unfinished/movies/Ohio_AD_20240808_00905_PS.mrc
1 1 1
 Using our own implementation based on MOTIONCOR2 algorithm
 to correct beam-induced motion for the following micrographs: 
 (skipping all micrographs for which a corrected movie already exists) 
 Correcting beam-induced motions using our own implementation ...
 Generating joint STAR file ... 
   0/   0 sec ............................................................~~(,_,">
 Written: mg_aln/test_unfinished/corrected_micrographs.star
 Now generating logfile.pdf ... 
 Did not find any of the expected EPS files to generate a PDF file
 + Will make an empty PDF-file in mg_aln/test_unfinished/batch.pdf
 Done! Written: mg_aln/test_unfinished/logfile.pdf
biochem-fan commented 3 months ago

@davidheisenberg Thank you very much for your testing. I updated the code (1441120) so that "/" is added earlier; now the continuation check and the actual file writing use the same output file path. I also made a new release 4.0.2 which includes this (and other) bug fixes.

davidheisenberg commented 3 months ago

Happy to help!