3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
443 stars 196 forks source link

CTF estimation creates symlinks with absolute paths #1113

Open DimitriosBellos opened 4 months ago

DimitriosBellos commented 4 months ago

Dear Relion developers,

Hi, my name is Dimitrios Bellos and I am a member of the AI & I team in the Rosalind Franklin Institute. Our team help with supporting our Franklin RELION users with issues.

Recently we spotted that there are issues arising from the fact that the CTF estimation step creates symlinks to the Motion corrected data using absolute paths.

Example in CTFFind/job003/ 'Position_99_035[-61_00]_EER_PS.mrc' -> '/<absolute-path>/MotionCorr/job002/<data-directory>/Position_99_035[-61_00]_EER_PS.mrc'

This can cause issues if the whole Relion Project directory is moved. This is common because a whole Relion Project directory may be moved from our compute infrastructure to Baskerville HPC and vise versa. Is it possible to make changes so relative symlinks are created ? example 'Position_99_035[-61_00]_EER_PS.mrc' -> '/../../../MotionCorr/job002/<data-directory>/Position_99_035[-61_00]_EER_PS.mrc' You can even generate the relative path using the realpath command (see here https://stackoverflow.com/questions/2564634/convert-absolute-path-into-relative-path-given-a-current-directory-using-bash )

Altenatively, can it even be done so no symlinks are used?

Kind regards, Dimitrios Bellos

biochem-fan commented 4 months ago

This can cause issues if the whole Relion Project directory is moved.

I doubt this. These links are created by a CTFFIND job and used only by the job itself. Thus, unless you move the project directory before the job completes, it should be fine. Am I missing other failure modes?

biochem-fan commented 4 months ago

[-61_00]

Oh, this is STA, not SPA. I know nothing about the STA workflow. STA related issues need to be dealt with by others.

DimitriosBellos commented 4 months ago

To help this is the script we run on the HPC

#!/bin/bash

#SBATCH --qos=rfi
#SBATCH --account=<account-name>
#SBATCH --time=0-01:00:00
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=4
#SBATCH --gpus-per-task=1

module purge
module load baskerville
module load RELION

# Import (relion)
mkdir -p Import/job001
time relion_import  --do_movies  --optics_group_name "opticsGroup1" --angpix 1.85 --kV 300 --Cs 2.7 --Q0 0.1 --beamtilt_x 0 --beamtilt_y 0 --i "data/HeLa_argon/Position_*.eer" --odir Import/job001/ --ofile movies.star --pipeline_control Import/job001/

# Motion correction (relion)
time srun `which relion_run_motioncorr_mpi` --i Import/job001/movies.star --o MotionCorr/job002/ --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 4 --float16 --bin_factor 1 --bfactor 150 --dose_per_frame 0.14 --preexposure 0 --patch_x 5 --patch_y 5 --eer_grouping 32 --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 29   --pipeline_control MotionCorr/job002/

# CTF correction (relion)
time srun `which relion_run_ctffind_mpi` --i MotionCorr/job002/corrected_micrographs.star --o CtfFind/job003/ --Box 512 --ResMin 30 --ResMax 5 --dFMin 5000 --dFMax 50000 --FStep 500 --dAst 100 --ctffind_exe ctffind --ctfWin -1 --is_ctffind4  --fast_search  --use_given_ps   --pipeline_control CtfFind/job003/
DimitriosBellos commented 4 months ago

The problem is after the completion of CTF estimation process. A directory is created in the <RELION-project-directory-name>/CTFFind/job003/ location that has the same structure as the data directory that exists on the RELION project directory level (<RELION-project-directory-name>/data/HeLa_argon/). It looks like this <RELION-project-directory-name>/CTFFind/job003/data/HeLa_argon/ and in it many symlinks are created.

If these symlinks are not longer needed after the completion of the CTF estimation process, can you please add a step to delete them after the CTF estimation is completed?

If they are needed even after the CTF estimation is completed, can you change the code so they are created using relative paths and not absolute paths. This way the symlinks will not break even if the whole RELION project directory is moved elsewhere.

It is a minor issue, but if the symlinks need to remain there after the CTF estimation completes, then having them in a form that they cannot break if the entire project directory is moved will be very useful.

biochem-fan commented 4 months ago

The symlinks are not used after the job completion as far as SPA is concerned. I think (not confirmed) they get deleted when a user "Gentle Clean" the job from the GUI.

can you please add a step to delete them after the CTF estimation is completed?

This is a valid suggestion but because it is harmless (and nobody complained for at least five years), my priority is low. A pull request is welcomed.

DimitriosBellos commented 4 months ago

No problem, we can perform the symlink delete part, if the delete of the links is supposed to be executed by the GUI.

We are currectly writing production scripts so a slurn script submitted to an HPC will perform a sequence of processes one after the other automatically. For this reason, we are running RELION solely using commands.

It might be a good idea to add in the documentation, for those that run RELION only via commands, that any symlinks created by CTFFind it is OK to delete them after CTFFind finishes.

Happy if you close the issue-ticket

biochem-fan commented 4 months ago

FYI:

Unfortunately I cannot help with the latter because I don't use the feature myself.

scheres commented 4 months ago

Just want to confirm that STA behaves in the exact same way as SPA here. The PS.mrc files get generated during motioncorr and are only temporarily symlinked. Yes, deleting them would be cleaner, but this should not cause any issues.

xinsheng44 commented 3 months ago

Hello, I also have the same problem, when using shell script in ctf, it will create a full path in ctffind directory, I do not know how to solve it now.

At the same time, when I do ctf, there is an error “ERROR: Failed to make a symlink from A to B”, but the symlink already exists under ctfFind, but the error is still displayed. How do you solve it?

xinsheng44 commented 3 months ago

The symlinks are not used after the job completion as far as SPA is concerned. I think (not confirmed) they get deleted when a user "Gentle Clean" the job from the GUI.

can you please add a step to delete them after the CTF estimation is completed?

This is a valid suggestion but because it is harmless (and nobody complained for at least five years), my priority is low. A pull request is welcomed.

I tested with SPA and STA and both had the same problem,