Open JohnAILove opened 3 years ago
Hi,my environment is also 1 GPU. Have you successfully run the training file? Such as train_PBAFN_stage1.py?
嗨,我的環境也是1個GPU。您是否成功運行了培訓文件?例如train_PBAFN_stage1.py?
not ,
-------------- End ----------------
THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
File "train_PBAFN_stage1.py", line 30, in
but can Run the demo
We can find a solution together
Our training code uses distributed data parallel (nn.DistributedDataParallel) in pytorch. If you want to train the model with one GPU, you need to remove DistributedDataParallel and move the model to GPU only by calling model.cuda(). You also need change the way to load data and you can refer to https://github.com/switchablenorms/DeepFashion_Try_On/blob/master/ACGPN_train/train.py
嗨,我的環境也是1個GPU。您是否成功運行了培訓文件?例如train_PBAFN_stage1.py?
not , -------------- End ---------------- THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 265, in set_device torch._C._cuda_setDevice(device) RuntimeError: cuda runtime error (10) : invalid device ordinal at /pytorch/torch/csrc/cuda/Module.cpp:33
but can Run the demo
We can find a solution together
Have you successfully run after Ge's guidance?
嗨,我的環境也是1個GPU。您是否成功運行了培訓文件?例如train_PBAFN_stage1.py?
not , -------------- End ---------------- THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 265, in set_device torch._C._cuda_setDevice(device) RuntimeError: cuda runtime error (10) : invalid device ordinal at /pytorch/torch/csrc/cuda/Module.cpp:33
but can Run the demo
We can find a solution together
Hey guys, I succeed in running the training code of the 1st stage with single GPU on Windows 10, maybe you could try my code, note that I strictly follow the instructions about the versions of modules provided by @geyuying.
I only modify 2 files, and you could get my code here. Currently, I'm still training the 1st stage, which costs much, but no errors till now.
Wish you success 👍
hey friend,i couldn't open this link ,Could please send me again, thanks.
Hi, I'm sorry that I cleaned my Google Drive once, and you could now obtain the files via this link. For other training-related .py and .sh files, you just need to modify accordingly (I leave some comments about changes in the code).
very thanks !
Hi,i am sorry. i still couldn't run this train_PBAFN_stage1.py , it have many issues. Maybe my GPU is not good ,but i want konw this result of this project. cloud i can skip train and directly run demo ? if not, could please tell me what i need do. thanks!
My environment is 1 GPU. How do I adjust the training set?