IsoNet-cryoET / spIsoNet

Overcoming the preferred orientation problem in cryoEM with self-supervised deep-learning
https://www.biorxiv.org/content/10.1101/2024.04.11.588921v1
MIT License
18 stars 4 forks source link

Cannot find .pt file #10

Open ThisIsIt42 opened 6 months ago

ThisIsIt42 commented 6 months ago

I use the following command : spisonet.py reconstruct P76_J435_map_half_1.mrc P76_J435_map_half_2.mrc --aniso_file FSC3D.mrc --mask P76_J435mask.mrc --limit_res 3.2 --epochs 30 --alpha 1 --beta 0.5 --output_dir output_1 --gpuID B2,B3 --acc_batches 2

and i get the following error :

04-22 14:05:11, INFO voxel_size 1.309999942779541 04-22 14:05:13, INFO spIsoNet correction until resolution 3.2A! Information beyond 3.2A remains unchanged 04-22 14:05:48, INFO Start preparing subvolumes! 04-22 14:05:55, INFO Done preparing subvolumes! 04-22 14:05:55, INFO Start training! 04-22 14:05:57, INFO Port number: 43493 learning rate 0.0003 ['..P76_J435_map_half_1_data', '..P76_J435_map_half_2_data'] Traceback (most recent call last): File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 553, in exit(main()) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 549, in main fire.Fire(ISONET) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 182, in reconstruct map_refine_n2n(halfmap1,halfmap2, mask_vol, fsc3d, alpha = alpha,beta=beta, voxel_size=voxel_size, output_dir=output_dir, File "/programs/x86_64-linux/spisonet/1.0_cu11.8/spIsoNet/spIsoNet/bin/map_refine.py", line 145, in map_refine_n2n network.train([data_dir_1,data_dir_2], output_dir, alpha=alpha,beta=beta, output_base=output_base0, batch_size=batch_size, epochs = epochs, steps_per_epoch = 1000, File "/programs/x86_64-linux/spisonet/1.0_cu11.8/spIsoNet/spIsoNet/models/network_n2n.py", line 273, in train checkpoint = torch.load(model_path) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 998, in load with _open_file_like(f, 'rb') as opened_file: File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 445, in _open_file_like return _open_file(name_or_buffer, mode) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 426, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: '..output_1/P76_J435_maphalf.pt

any idea what the issue is ?

procyontao commented 6 months ago

Hi,

This could be a bug in spIsoNet related to the name of output folder. How about changing "--output_dir output_1 " to "--output_dir output1", or without specify "--output_dir"?

ThisIsIt42 commented 6 months ago

$ spisonet.py reconstruct P76_J435_map_half_1.mrc P76_J435_map_half_2.mrc --aniso_file FSC3D.mrc --mask P76_J435mask.mrc --limit_res 3.2 --epochs 30 --alpha 1 --beta 0.5 --gpuID B2,B3 --acc_batches 2 04-22 15:22:00, INFO voxel_size 1.309999942779541 04-22 15:22:02, INFO spIsoNet correction until resolution 3.2A! Information beyond 3.2A remains unchanged 04-22 15:22:37, INFO Start preparing subvolumes! 04-22 15:22:42, INFO Done preparing subvolumes! 04-22 15:22:42, INFO Start training! 04-22 15:22:44, INFO Port number: 40421 learning rate 0.0003 ['isonet_maps/P76_J435_map_half_1_data', 'isonet_maps/P76_J435_map_half_2_data'] Traceback (most recent call last): File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 553, in exit(main()) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 549, in main fire.Fire(ISONET) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 182, in reconstruct map_refine_n2n(halfmap1,halfmap2, mask_vol, fsc3d, alpha = alpha,beta=beta, voxel_size=voxel_size, output_dir=output_dir, File "/programs/x86_64-linux/spisonet/1.0_cu11.8/spIsoNet/spIsoNet/bin/map_refine.py", line 145, in map_refine_n2n network.train([data_dir_1,data_dir_2], output_dir, alpha=alpha,beta=beta, output_base=output_base0, batch_size=batch_size, epochs = epochs, steps_per_epoch = 1000, File "/programs/x86_64-linux/spisonet/1.0_cu11.8/spIsoNet/spIsoNet/models/network_n2n.py", line 273, in train checkpoint = torch.load(model_path) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 998, in load with _open_file_like(f, 'rb') as opened_file: File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 445, in _open_file_like return _open_file(name_or_buffer, mode) File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 426, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'isonet_maps/P76_J435_maphalf.pt' [ InputData_01]$

So I tried both suggestions and receive the same error. I inspected the subvolumes folder and there doesnt seem to be any .pt files

procyontao commented 6 months ago

Hi,

I would like to ask what does it means by "--gpuID B2,B3", usually the "--gpuID" should be 0,1 or 0,1,2,3, or without specifying it to use all available gpus.

ThisIsIt42 commented 6 months ago

So intially when I ran with GPU ids 0,1,2 i would get an error saying : Triton Error [CUDA]: device kernel image is invalid. So i found the ids of the gpus.

ThisIsIt42 commented 6 months ago

So i removed the gpu flag and i got an output port but the run failed. I checked our cuda version and it seems to be 11.2 would this be the cause of the failure to run ?

procyontao commented 6 months ago

The cuda version need to match the pytorch version and the Nvidia driver. Probably you want to choose an old version of pytorch? https://pytorch.org/get-started/previous-versions/

flng000 commented 5 months ago

Hi, I have the same troubles (Triton Error [CUDA]: device kernel image is invalid). Which version of pytorch would you recommend? Thanks and all the best

shl4014 commented 4 months ago

I also got the same error as ThisIsIt42, and tried removing the output_dir flag and gpu flag, but this did not help. Are there any more ideas how to solve this issue? Thanks! Shifra