Open davidhoover opened 3 months ago
I figured out a constellation of versions that works. I first needed to install using this yml file:
name: spisonet
channels:
- pytorch
- nvidia
- conda-forge
- defaults
dependencies:
- python=3.10
- pytorch
- pytorch-cuda=11.8
- numpy=1.26.4
- setuptools=68.0.0
- mkl=2024.0
- pip
- pip:
- scikit-image
- matplotlib
- mrcfile
- fire
- tqdm
- .
I've attached the list of packages in my conda environment in case anyone else runs into this problem. n.txt
Hi
Sorry for late reply. Please see this issue: https://github.com/IsoNet-cryoET/spIsoNet/issues/13#issuecomment-2122749336
They used multiple GPU that avoid this error.
But I can not reproduce this sometime with one GPU. This is related to the torch distributed data parallel and need to be fixed.
I recently installed spisonet and attempted to run the tutorial. Pytorch crashed immediately during the training with these errors:
What version of torch is required? We have 2.3.1+cu118. This was run on a single P100 GPU.