This is a template for reporting bugs. Please fill in as much information as you can.
Describe your problem
Hi, I am getting the following error when trying to use the blush regularisation in 3D classification.
Environment:
OS: Ubuntu 22.04
MPI runtime: OpenMPI 4.0.3
RELION-5.0-beta-3
Memory: 64G
GPU: 2 x RTX4090
Dataset:
Box size:600 px
Pixel size:0.3943 A
Number of particles:200,000
Description: Dimer 150 kDa
Job options:
Type of job: 3D classification
Number of MPI processes: 5
Number of threads: 6
Full command (see note.txt in the job directory):which relion_refine_mpi --continue Class3D/job264/run_it007_optimiser.star --o Class3D/job273/run --dont_combine_weights_via_disc --scratch_dir /scr --pool 3 --pad 2 --iter 25 --tau2_fudge 4 --particle_diameter 235 --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --j 6 --gpu "0,1" --pipeline_control Class3D/job273/
Error message:
Run.out
Expectation iteration 1 of 25
2.47/2.47 min ............................................................~~(,,">
Maximization (with Blush regularization)...
000/??? sec ~~(,,"> [oo]
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 499754 RUNNING AT sn4622120434
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
run.err
Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyCould not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
/data/path/newrelion5/bin/relion_python_blush: line 36: 501058 Aborted (core dumped) TORCH_HOME="$torch_home" "$python_executable" -c "from relion_blush import main; exit(main())" "$@"
Something went wrong in the external Python call...
Command: relion_python_blush Class3D/job270/run_it001_class001_external_reconstruct.star --gpu 0,1,0,1,
---------------------------------- PYTHON ERROR ---------------------------------
Has RELION been provided a Python interpreter with the correct environment?
The interpreter can be passed to RELION either during Cmake configuration by
using the Cmake flag -DPYTHON_EXE_PATH=<path/to/python/interpreter>.
NOTE: For some modules TORCH_HOME needs to be set to find pretrained models
Using python executable: /home/cryosparc_user/mambaforge/envs/relion-5.0/bin/python
This is a template for reporting bugs. Please fill in as much information as you can.
Describe your problem Hi, I am getting the following error when trying to use the blush regularisation in 3D classification.
Environment:
Dataset:
Job options:
note.txt
in the job directory):which relion_refine_mpi
--continue Class3D/job264/run_it007_optimiser.star --o Class3D/job273/run --dont_combine_weights_via_disc --scratch_dir /scr --pool 3 --pad 2 --iter 25 --tau2_fudge 4 --particle_diameter 235 --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --j 6 --gpu "0,1" --pipeline_control Class3D/job273/Error message: Run.out
Expectation iteration 1 of 25 2.47/2.47 min ............................................................~~(,,"> Maximization (with Blush regularization)... 000/??? sec ~~(,,"> [oo]
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 499754 RUNNING AT sn4622120434 = EXIT CODE: 9 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions
run.err Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyCould not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory /data/path/newrelion5/bin/relion_python_blush: line 36: 501058 Aborted (core dumped) TORCH_HOME="$torch_home" "$python_executable" -c "from relion_blush import main; exit(main())" "$@"
Something went wrong in the external Python call... Command: relion_python_blush Class3D/job270/run_it001_class001_external_reconstruct.star --gpu 0,1,0,1,
---------------------------------- PYTHON ERROR --------------------------------- Has RELION been provided a Python interpreter with the correct environment? The interpreter can be passed to RELION either during Cmake configuration by using the Cmake flag -DPYTHON_EXE_PATH=<path/to/python/interpreter>. NOTE: For some modules TORCH_HOME needs to be set to find pretrained models
Using python executable: /home/cryosparc_user/mambaforge/envs/relion-5.0/bin/python