Closed kinglon closed 3 years ago
I have tested it on win10, I think you may try it on ubuntu.
I have ever trained it on Ubuntu, the problem still exists. I found most time is used on the backward, it is about 56 seconds, so I tried to add the option --g_reg_every 32 to accelerate the train speed, do it bring any new problems?
I remember we have the same issue before, but the issue disappear after switching to another server, which running with RTX 2080 Ti. so I also don't know how to resolve your issue
I will make a try by switching to another server too.
How can I improve the train speed?
It takes 2 hours to make a train with 1000 iterations on 2x GeForce RTX 3090, and 10000k will need 833 days but your train only 20 days.
my train command is as follows :
python -m torch.distributed.launch --nproc_per_node=2 --master_port=9999 train.py --num_worker 4 --resolution 1024 --name Jeric --iter 1000 --batch 1 --mixing 0.9 path-to-your-image-folders --condition_path path-to-your-segmap-folders
path-to-your-image-folders, set to the CelebA-HQ-img folder of Celeb dataset.
path-to-your-segmap-folders , set to the CelebAMask-HQ folder downloaded from your pre-process ffhq and celeba segmaps.
trained on Windows 10
Thanks.