Closed aminzabardast closed 1 year ago
Hi, @aminzabardast
I don't know what PyTorch version you used. Please ensure install the correct version. If not work, please provide more details.
Hi, @GewelsJI. Thank you for the quick response.
I matched all the requirements by containing it in a docker container. I used this image on Docker hub.
My DockerFile:
FROM pytorch/pytorch:1.3-cuda10.1-cudnn7-runtime
LABEL authors="amin"
RUN conda install pytorch=1.3.1 torchvision=0.4.2 cudatoolkit=10.0 --yes
RUN pip install opencv-python==3.4.2.17 tensorboardX==2.0
Training on CPU (although slow) runs correctly, but training on GPU has this issue. I forked the repository and all my changes are in there.
Hi, @aminzabardast
Could you verify it on local environment? I have not done it on docker image.
@GewelsJI Unfortunately, matching the exact CUDA/cuDNN requirements are a challenge. But conceptually, there should be no difference between what runs in a docker container and a local execution.
@aminzabardast Agree. But I have no relevant experience to provide you on docker. Or you can take a try on my latest project: https://github.com/GewelsJI/DGNet/tree/main/lib_pytorch
There are issues when I try to train the network on GPU.
By adding
torch.autograd.set_detect_anomaly(True)
to theTraining.py
the following error appears: