About running training code on a GPU

Zibin-Z commented 3 years ago

I only have one GPU, and now I have successfully run test.py. But when I try to run train***.py, it will report this error: So I added some parameters by searching for information, and then running, it will report this error.

I don’t know how to solve this error so far.How can I solve this problem?

geyuying commented 3 years ago

Our training code uses distributed data parallel (nn.DistributedDataParallel) in pytorch. If you want to train the model with one GPU, you need to remove DistributedDataParallel and move the model to GPU only by calling model.cuda(). You also need change the way to load data and you can refer to https://github.com/switchablenorms/DeepFashion_Try_On/blob/master/ACGPN_train/train.py

Zibin-Z commented 3 years ago

Our training code uses distributed data parallel (nn.DistributedDataParallel) in pytorch. If you want to train the model with one GPU, you need to remove DistributedDataParallel and move the model to GPU only by calling model.cuda(). You also need change the way to load data and you can refer to https://github.com/switchablenorms/DeepFashion_Try_On/blob/master/ACGPN_train/train.py

I modified the way of loading data and removed the corresponding places according to your guidance, as shown in the following figure: But it didn't solve the problem. TypeError: 'int' object is not iterable Through debugging, I saw that this problem occurred at this position in the picture below. This should be of type img, but here it appears ndarrary.The error is reported here, do you know the reason?

geyuying / PF-AFN

About running training code on a GPU #32