Open von-elfen opened 1 week ago
Hi,
Thanks for your attention! Please try pulling the latest update in this repo. I updated the training code to omit the warning and show the training progess bar between iterations, which is missing in the previous version that may be misleading. And the warning should not affect the final performance. Also please check out if there is output in the checkpoints/empair-10028-test
(our example) or your customized output folder.
Cheers, Jiakai
Hello,
Sorry for the misleading. I forgot to copy the error message from the last line:Segmentation fault (core dumped)
git clone https://github.com/Cellverse/CryoGEM.git
cd CryoGEM
conda create -n cryogem python=3.11 -y
conda activate cryogem
pip install -e .
cryogem gen_data --mode homo --device cuda:0 \
--input_map testing/data/exp_abinitio_volumes/densitymap.10028.90.mrc \
--save_dir save_images/gen_data/Ribosome\(10028\)/training_dataset/ \
--n_micrographs 100 --particle_size 90 --mask_threshold 0.9
cryogem gen_data --mode homo --device cuda:0 \ --input_map testing/data/exp_abinitio_volumes/densitymap.10028.90.mrc \ --save_dir save_images/gen_data/Ribosome(10028)/testing_dataset/ \ --n_micrographs 1000 --particle_size 90 --mask_threshold 0.9
cryogem esti_ice --apix 5.36 --device cuda:0 \
--input_dir testing/data/Ribosome\(10028\)/real_data/ \
--save_dir save_images/esti_ice/Ribosome\(10028\)/
cryogem train --name empair-10028-test --max_dataset_size 100 --apix 5.36 --gpu_ids 0 \
--real_dir testing/data/Ribosome\(10028\)/real_data/ \
--sync_dir save_images/gen_data/Ribosome\(10028\)/training_dataset/mics_mrc \
--mask_dir save_images/gen_data/Ribosome\(10028\)/training_dataset/particles_mask \
--weight_map_dir save_images/esti_ice/Ribosome\(10028\)/
I got stuck at step3:training:
Loading real_A: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 223.02it/s] Loading weight cards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 935/935 [00:03<00:00, 248.08it/s] Loading real_B: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 53.67it/s] (INFO) (cryogem_model.py) (29-Oct-24 22:10:59) Sampler type: mask_sample (INFO) (cryogem_model.py) (29-Oct-24 22:10:59) 1,2,3,4,5 model [CryoGEMModel] was created Epoch 1/25, iters: 0/100: 0%| | 0/100 [00:00<?, ?it/s]Segmentation fault (core dumped)
My platform is Ubuntu 22.04 with 7900x CPU and 4060Ti GPU, and drivers and cuda work fine.
I do not understand "Segmentation fault (core dumped)", and I tried to reinstall and run the command again. But the error was still there.
Looking for your professional solution.
Best Regards.
Hi,
It seems like you met an error from some incompatible C++ or C files in the main training loop of cryoGEM, it may be caused by the wrong versions of your PyTorch and CUDA in this project's environment.
I suggest that you can:
1) try to disable CUDA but use CPU to see if you can train cryoGEM and try to construct a cuda tensor in the terminal to see if anything wrong happened.
2) try to locate the error by setting CUDA_LAUNCH_BLOCKING=1
in your terminal and print some content out in the main loop of the training code commands/train.py
(Line 68 - Line 115) to locate which line are stuck.
Best,
Hi, I successfully install CryoGEM and download the data.zip then unzip it under testing folder. And I tried to reproduce the output of Ribosome(10048) dataset to test my installation, but something got wrong. I run the command as following:
cryogem gen_data --mode homo --device cuda:0 \ --input_map testing/data/exp_abinitio_volumes/densitymap.10028.90.mrc \ --save_dir save_images/gen_data/Ribosome\(10028\)/training_dataset/ \ --n_micrographs 100 --particle_size 90 --mask_threshold 0.9
# testing dataset cryogem gen_data --mode homo --device cuda:0 \ --input_map testing/data/exp_abinitio_volumes/densitymap.10028.90.mrc \ --save_dir save_images/gen_data/Ribosome\(10028\)/testing_dataset/ \ --n_micrographs 1000 --particle_size 90 --mask_threshold 0.9
cryogem esti_ice --apix 5.36 --device cuda:0 \ --input_dir testing/data/Ribosome\(10028\)/real_data/ \ --save_dir save_images/esti_ice/Ribosome\(10028\)/
Above command run well!
But when i run:
cryogem train --name empair-10028-test --max_dataset_size 100 --apix 5.36 --gpu_ids 0 \ --real_dir testing/data/Ribosome\(10028\)/real_data/ \ --sync_dir save_images/gen_data/Ribosome\(10028\)/training_dataset/mics_mrc \ --mask_dir save_images/gen_data/Ribosome\(10028\)/training_dataset/particles_mask \ --weight_map_dir save_images/esti_ice/Ribosome\(10028\)/
How could i solve the problem?
BW!