My environment is 1 GPU. How do I adjust the training set?

JohnAILove commented 3 years ago

Zibin-Z commented 3 years ago

Hi,my environment is also 1 GPU. Have you successfully run the training file? Such as train_PBAFN_stage1.py?

JohnAILove commented 3 years ago

嗨，我的環境也是1個GPU。您是否成功運行了培訓文件？例如train_PBAFN_stage1.py？

not , -------------- End ---------------- THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 265, in set_device torch._C._cuda_setDevice(device) RuntimeError: cuda runtime error (10) : invalid device ordinal at /pytorch/torch/csrc/cuda/Module.cpp:33

but can Run the demo

We can find a solution together

geyuying commented 3 years ago

Our training code uses distributed data parallel (nn.DistributedDataParallel) in pytorch. If you want to train the model with one GPU, you need to remove DistributedDataParallel and move the model to GPU only by calling model.cuda(). You also need change the way to load data and you can refer to https://github.com/switchablenorms/DeepFashion_Try_On/blob/master/ACGPN_train/train.py

kris-yangjs commented 3 years ago

嗨，我的環境也是1個GPU。您是否成功運行了培訓文件？例如train_PBAFN_stage1.py？

not , -------------- End ---------------- THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 265, in set_device torch._C._cuda_setDevice(device) RuntimeError: cuda runtime error (10) : invalid device ordinal at /pytorch/torch/csrc/cuda/Module.cpp:33

but can Run the demo

We can find a solution together

Have you successfully run after Ge's guidance?

Sam1224 commented 3 years ago

嗨，我的環境也是1個GPU。您是否成功運行了培訓文件？例如train_PBAFN_stage1.py？

not , -------------- End ---------------- THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 265, in set_device torch._C._cuda_setDevice(device) RuntimeError: cuda runtime error (10) : invalid device ordinal at /pytorch/torch/csrc/cuda/Module.cpp:33

but can Run the demo

We can find a solution together

Hey guys, I succeed in running the training code of the 1st stage with single GPU on Windows 10, maybe you could try my code, note that I strictly follow the instructions about the versions of modules provided by @geyuying.

I only modify 2 files, and you could get my code here. Currently, I'm still training the 1st stage, which costs much, but no errors till now.

Wish you success 👍

fun-code-ai commented 2 years ago

hey friend,i couldn't open this link ,Could please send me again, thanks.

Sam1224 commented 2 years ago

Hi, I'm sorry that I cleaned my Google Drive once, and you could now obtain the files via this link. For other training-related .py and .sh files, you just need to modify accordingly (I leave some comments about changes in the code).

fun-code-ai commented 2 years ago

very thanks !

fun-code-ai commented 2 years ago

Hi,i am sorry. i still couldn't run this train_PBAFN_stage1.py , it have many issues. Maybe my GPU is not good ,but i want konw this result of this project. cloud i can skip train and directly run demo ? if not, could please tell me what i need do. thanks!

geyuying / PF-AFN

My environment is 1 GPU. How do I adjust the training set? #31