Closed RzzzLiu closed 1 month ago
您好,感谢您的关注。我们在readme中给出了环境安装的参考过程,如下所示。
git clone git@github.com:BBBBchan/CorrMatch.git
cd CorrMatch
conda create -n corrmatch python=3.9
conda activate corrmatch
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install opencv-python tqdm einops pyyaml
您好,感谢您的关注。我们在readme中给出了环境安装的参考过程,如下所示。
git clone git@github.com:BBBBchan/CorrMatch.git cd CorrMatch conda create -n corrmatch python=3.9 conda activate corrmatch conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia pip install opencv-python tqdm einops pyyaml
非常感谢你!
您好,感谢您的关注。我们在readme中给出了环境安装的参考过程,如下所示。
git clone git@github.com:BBBBchan/CorrMatch.git cd CorrMatch conda create -n corrmatch python=3.9 conda activate corrmatch conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia pip install opencv-python tqdm einops pyyaml
/home/xxx/miniconda3/envs/corrmatch/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects --local_rank
argument to be set, please
change it to read from os.environ['LOCAL_RANK']
instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn( WARNING:torch.distributed.run:***** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Traceback (most recent call last):
File "/home/xxx/CorrMatch-main/corrmatch.py", line 310, in
corrmatch.py FAILED
Other Failures: [1]: time: 2024-09-24_21:30:23 rank: 1 (local_rank: 1) exitcode: 1 (pid: 1248846) error_file: <N/A> msg: "Process failed with exitcode 1"
仍然会出现这个错误,这个错误是GPU导致的吗?我用的双卡去训练的,指令是按照readme的参考
您好,参考报错信息
torch.cuda.set_device(rank % num_gpus)
ZeroDivisionError: integer division or modulo by zero
似乎系统没有读取到GPU,您可以尝试独立运行以下脚本验证cuda是否正确安装,以及能否读取到GPU。
import torch
flag = torch.cuda.is_available()
print(flag)
num_gpus = torch.cuda.device_count()
print(num_gpus)
在两卡的情况下,上述脚本的输入应该为:
True
2
如果您能成功运行上述脚本,可能是因为您的GPU编号不是从0开始的默认编号,我们在corrmatch.py的第32行os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
指定了系统可见的GPU id。
import torch flag = torch.cuda.is_available() print(flag) num_gpus = torch.cuda.device_count() print(num_gpus)
是的,调整好32行就行了,非常感谢!!
您好,请问能否给出相关的虚拟环境配置的要求。我用UniMatch的环境运行您的代码,发现并不能跑通,能否给出相关的环境配置。非常感谢!