IsoNet-cryoET / spIsoNet

Overcoming the preferred orientation problem in cryoEM with self-supervised deep-learning
https://www.biorxiv.org/content/10.1101/2024.04.11.588921v1
MIT License
17 stars 4 forks source link

Example run: No such file or directory: 'Refine3D/job001/run_it001_half1_class001_unfil.mrc' #13

Open wlugmayr opened 4 months ago

wlugmayr commented 4 months ago

here is my commandline:

srun --mpi=pmi2 which relion_refine_mpi --o Refine3D/job001/run --auto_refine --split_random_halves --i job025_tutorial.star --ref HA_reference.mrc --firstiter_cc --ini_high 10 --dont_combine_weights_via_disc --pool 3 --pad 2 --ctf --particle_diameter 170 --flatten_solvent --zero_mask --solvent_mask mask.mrc --oversampling 1 --healpix_order 2 --auto_local_healpix_order 3 --offset_range 5 --offset_step 2 --sym C3 --low_resol_join_halves 40 --norm --scale --j 1 --gpu "" --external_reconstruct --keep_lowres --pipeline_control Refine3D/job001/

here is parts of the run.out

Expectation iteration 1 7.45/7.43 min ............................................................~~(,_,"> Averaging half-reconstructions up to 40 Angstrom resolution to prevent diverging orientations ... Note that only for higher resolutions the FSC-values are according to the gold-standard! Calculating gold-standard FSC ... Maximization ...

and here the run.err:

The following warnings were encountered upon command-line parsing: WARNING: Option --keep_lowres is not a valid RELION argument Traceback (most recent call last): File "/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py", line 362, in shutil.copy(mrc_unfil, mrc_unfil_backup) File "/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/shutil.py", line 417, in copy copyfile(src, dst, follow_symlinks=follow_symlinks) File "/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/shutil.py", line 254, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: 'Refine3D/job001/run_it001_half1_class001_unfil.mrc' in: /gpfs/cssb/software/tmp/install/relion-4.0.1/src/backprojector.cpp, line 1294 ERROR: ERROR: there was something wrong with system call: /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py Refine3D/job001/run_it001_half1_class001_external_reconstruct.star === Backtrace === /gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(_ZN11RelionErrorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x69) [0x4c7eb9] /gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi() [0x44f710] /gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi12maximizationEv+0x17dc) [0x4ffb4c] /gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0x482) [0x500b52] /gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(main+0x59) [0x4b6a49] /lib64/libc.so.6(+0x3feb0) [0x14a1bf43feb0] /lib64/libc.so.6(__libc_start_main+0x80) [0x14a1bf43ff60] /gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(_start+0x25) [0x4b9ba5]

ERROR: ERROR: there was something wrong with system call: /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py Refine3D/job001/run_it001_half1_class001_external_reconstruct.star

$ find Refine3D

Refine3D Refine3D/job001 Refine3D/job001/run_it000_half2_class001_angdist.bild Refine3D/job001/run.err Refine3D/job001/default_pipeline.star Refine3D/job001/run_it001_half2_class001_external_reconstruct.star Refine3D/job001/run_it000_sampling.star Refine3D/job001/run_it001_half1_class001_external_reconstruct_data_real.mrc Refine3D/job001/run_it001_half2_class001_external_reconstruct_data_real.mrc Refine3D/job001/run.out Refine3D/job001/run_it001_half2_class001_external_reconstruct_weight.mrc Refine3D/job001/run_it000_half1_model.star Refine3D/job001/run_it000_optimiser.star Refine3D/job001/run_it000_half1_class001.mrc Refine3D/job001/run_it001_half1_class001_external_reconstruct.star Refine3D/job001/run_it000_half2_class001.mrc Refine3D/job001/.run.err.tail Refine3D/job001/.run.out.tail Refine3D/job001/run_submit.script Refine3D/job001/job_pipeline.star Refine3D/job001/job.star Refine3D/job001/run_it000_half2_model.star Refine3D/job001/run_it001_half1_class001_external_reconstruct_data_imag.mrc Refine3D/job001/run_it000_data.star Refine3D/job001/run_it001_half2_class001_external_reconstruct_data_imag.mrc Refine3D/job001/run_it001_half1_class001_external_reconstruct_weight.mrc Refine3D/job001/note.txt Refine3D/job001/run_it000_half1_class001_angdist.bild Refine3D/job001/RELION_JOB_EXIT_FAILURE Refine3D/job001/run_it001_half1_class001_external_reconstruct.mrc Refine3D/job001/run_it001_half2_class001_external_reconstruct.mrc Refine3D/spisonet

procyontao commented 4 months ago

Hi

I wonder whether this error will appear when you add --solvent_correct_fsc into the command

wlugmayr commented 4 months ago

Hi,

yes now it comes to iteration 5..

commandline:

srun --mpi=pmi2 which relion_refine_mpi --o Refine3D/job001/run --auto_refine --split_random_halves --i job025_tutorial.star --ref HA_reference.mrc --firstiter_cc --ini_high 10 --dont_combine_weights_via_disc --pool 3 --pad 2 --ctf --particle_diameter 170 --flatten_solvent --zero_mask --solvent_mask mask.mrc --oversampling 1 --healpix_order 2 --auto_local_healpix_order 3 --offset_range 5 --offset_step 2 --sym C3 --low_resol_join_halves 40 --norm --scale --j 1 --gpu "" --external_reconstruct --keep_lowres --solvent_correct_fsc --pipeline_control Refine3D/job001/

possible new error messages:

File "/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64]] is at version 4; expected version 3 instead. Hint: enable an omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

FileNotFoundError: [Errno 2] No such file or directory: 'Refine3D/job001/corrected_run_it005_half1_class001_unfil.mrc' in: /gpfs/cssb/software/tmp/install/relion-4.0.1/src/backprojector.cpp, line 1294

I installed torch like: pip install torch --index-url https://download.pytorch.org/whl/cu118

logfiles.zip

procyontao commented 4 months ago

Hi,

I have also experienced this problem. This is because data have to pass through the same network more than once. I do not know exact solution to it now. What I current experience is the following, (probably not correct):

  1. This could happens when spIsoNet uses one GPU
  2. This is also related to the version of torch and graphic cards.
wlugmayr commented 4 months ago

Yes with multiple GPUs it is working now. At the beginning I did not specify CUDA_VISIBLE_DEVICES and got an error. So I set it to: CUDA_VISIBLE_DEVICES=0 But when is set them now e.g. 4 GPU node to CUDA_VISIBLE_DEVICES=0 1 2 3 the program is running without error to the end in Relion4 & Relion5 (for both I used the full path to python to avoid clashes with the relion5 conda dependencies) - environment modules style:

setenv RELION_EXTERNAL_RECONSTRUCT_EXECUTABLE {/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py} setenv CONDA_ENV spisonet-1.0.0 setenv CUDA_VISIBLE_DEVICES {0 1 2 3}

Why do you write in your documentation that spIsoNet does not work with Relion5? Is the output mrc wrong?

procyontao commented 4 months ago

If you can run through relion5 it should be totally great. Saying the spIsoNet does not work for relion5 is because of the clashing of the conda environment or blush. It would be great if you can share the details on what environment need to be set for relion5. whether it need to deactivate conda for relion5 and use spisonet's instead?

wlugmayr commented 4 months ago

Well the solution is quite simple:

The trick is to provide the full path to the python executable to spIsoNet. Here some tests:

$ which python /gpfs/cssb/software/rhel9/anaconda3/envs/relionconda-5.0.1/bin/python $ /gpfs/cssb/software/rhel9/anaconda3/envs/relionconda-5.0.1/bin/python -m pip list | grep blush relion-blush 0.0.1 $ /gpfs/cssb/software/rhel9/anaconda3/envs/relionconda-5.0.1/bin/python -m pip list | grep spisonet

$ /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python -m pip list | grep blush $ /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python -m pip list | grep spisonet spIsoNet 1.0

The dedicated python executable knows its packages so there should be no clashes between different conda environments. For the spIsoNet wrapper you do not have to activate the spIsoNet conda.

So instead of setting (which will end up in using the Relion5 python): export RELION_EXTERNAL_RECONSTRUCT_EXECUTABLE='python /fullpath_to_spisonet_wrapper/relion_wrapper.py' you set: export RELION_EXTERNAL_RECONSTRUCT_EXECUTABLE=' /fullpath_to_spisonet_python/python /fullpath_to_spisonet_wrapper/relion_wrapper.py'

In the Relion Gui I have set Reference -> Use Blush regularisation? -> No and the job runs technically to the end generating an mrc output file.