Open ThisIsIt42 opened 6 months ago
Hi,
This could be a bug in spIsoNet related to the name of output folder. How about changing "--output_dir output_1 " to "--output_dir output1", or without specify "--output_dir"?
$ spisonet.py reconstruct P76_J435_map_half_1.mrc P76_J435_map_half_2.mrc --aniso_file FSC3D.mrc --mask P76_J435mask.mrc --limit_res 3.2 --epochs 30 --alpha 1 --beta 0.5 --gpuID B2,B3 --acc_batches 2
04-22 15:22:00, INFO voxel_size 1.309999942779541
04-22 15:22:02, INFO spIsoNet correction until resolution 3.2A!
Information beyond 3.2A remains unchanged
04-22 15:22:37, INFO Start preparing subvolumes!
04-22 15:22:42, INFO Done preparing subvolumes!
04-22 15:22:42, INFO Start training!
04-22 15:22:44, INFO Port number: 40421
learning rate 0.0003
['isonet_maps/P76_J435_map_half_1_data', 'isonet_maps/P76_J435_map_half_2_data']
Traceback (most recent call last):
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 553, in
So I tried both suggestions and receive the same error. I inspected the subvolumes folder and there doesnt seem to be any .pt files
Hi,
I would like to ask what does it means by "--gpuID B2,B3", usually the "--gpuID" should be 0,1 or 0,1,2,3, or without specifying it to use all available gpus.
So intially when I ran with GPU ids 0,1,2 i would get an error saying : Triton Error [CUDA]: device kernel image is invalid. So i found the ids of the gpus.
So i removed the gpu flag and i got an output port but the run failed. I checked our cuda version and it seems to be 11.2 would this be the cause of the failure to run ?
The cuda version need to match the pytorch version and the Nvidia driver. Probably you want to choose an old version of pytorch? https://pytorch.org/get-started/previous-versions/
Hi, I have the same troubles (Triton Error [CUDA]: device kernel image is invalid). Which version of pytorch would you recommend? Thanks and all the best
I also got the same error as ThisIsIt42, and tried removing the output_dir flag and gpu flag, but this did not help. Are there any more ideas how to solve this issue? Thanks! Shifra
I use the following command : spisonet.py reconstruct P76_J435_map_half_1.mrc P76_J435_map_half_2.mrc --aniso_file FSC3D.mrc --mask P76_J435mask.mrc --limit_res 3.2 --epochs 30 --alpha 1 --beta 0.5 --output_dir output_1 --gpuID B2,B3 --acc_batches 2
and i get the following error :
04-22 14:05:11, INFO voxel_size 1.309999942779541 04-22 14:05:13, INFO spIsoNet correction until resolution 3.2A! Information beyond 3.2A remains unchanged 04-22 14:05:48, INFO Start preparing subvolumes! 04-22 14:05:55, INFO Done preparing subvolumes! 04-22 14:05:55, INFO Start training! 04-22 14:05:57, INFO Port number: 43493 learning rate 0.0003 ['..P76_J435_map_half_1_data', '..P76_J435_map_half_2_data'] Traceback (most recent call last): File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 553, in
exit(main())
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 549, in main
fire.Fire(ISONET)
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/bin/spisonet.py", line 182, in reconstruct
map_refine_n2n(halfmap1,halfmap2, mask_vol, fsc3d, alpha = alpha,beta=beta, voxel_size=voxel_size, output_dir=output_dir,
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/spIsoNet/spIsoNet/bin/map_refine.py", line 145, in map_refine_n2n
network.train([data_dir_1,data_dir_2], output_dir, alpha=alpha,beta=beta, output_base=output_base0, batch_size=batch_size, epochs = epochs, steps_per_epoch = 1000,
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/spIsoNet/spIsoNet/models/network_n2n.py", line 273, in train
checkpoint = torch.load(model_path)
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 998, in load
with _open_file_like(f, 'rb') as opened_file:
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 445, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/programs/x86_64-linux/spisonet/1.0_cu11.8/mamba/lib/python3.10/site-packages/torch/serialization.py", line 426, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '..output_1/P76_J435_maphalf.pt
any idea what the issue is ?