Training Time ? - Githubissues

saurabhsharma1993 commented 6 years ago

Hi, Thanx for uploading your code. Can you tell me how much time does it take to train the model for 90 epochs ( i don't have much disk space so i'm not storing intermediate pose maps either ) ? It's taking 0.5/epoch hour on my machine ( nvidia 1080 ti ). Do you get similar times ?

AliaksandrSiarohin commented 6 years ago

Hi, For market it was about 15m per epoch. So training takes ~22h, on titan X. For fashion ~1h per epoch. I remember that this maps gives significant improvement. One way to alleviate this problem can be to generate this maps on gpu. I did not try this, but this can be done usign gan/layer_utils/GaussianFromPointsLayer .

saurabhsharma1993 commented 6 years ago

okay . another question I have is when can I expect to see meaningful images ? i have trained the model already for 5 epochs ( for fashion ), and as yet the results capture very little semantic structure. moreover, the change in quality from initialisation till 5 epochs isn't significant, and it seems as if the model isn't learning at all.

AliaksandrSiarohin commented 6 years ago

Not sure what you mean. Here is what I have. 1.zip 2.zip 3.zip 4.zip 5.zip

saurabhsharma1993 commented 6 years ago

Here's what I have. This is the full model without any changes to your code, running on the fasion datasets. Do you have any idea what's happening ? Did your model converge every time ?

git.zip

AliaksandrSiarohin commented 6 years ago

I run full model for fasion only 2 times, so it may be not very representative. I never observe staff like this in any experiment. It is hard to say what is going on, from images only. Can you send me log with losses and command that you to launch this thing?

saurabhsharma1993 commented 6 years ago

Command :

CUDA_VISIBLE_DEVICES=0 nohup python train.py --output_dir output/full/fasion --checkpoints_dir output/full/fasion --warp_skip mask --dataset fasion --l1_penalty_weight 0.01 --nn_loss_area_size 5 --batch_size 2 --content_loss_layer block1_conv2 --number_of_epochs 90 > log_fasion 2>&1 &

Log : log.txt

AliaksandrSiarohin commented 6 years ago

I think the main problem is that I change gan submoule and it is not compatible now. So you should use old version of it. This can be done if you clone repository using command: git clone --recursive https://github.com/AliaksandrSiarohin/pose-gan/ .

AliaksandrSiarohin commented 6 years ago

Or checkout gan to appropriate commit. cd gan; git checkout a4aa9a38792e89480ffbb15dc7e9beba8513e893

saurabhsharma1993 commented 6 years ago

Yes that was the issue. Please update the repo accordingly.

AliaksandrSiarohin commented 6 years ago

I think if you clone with --recursive, by default it use correct version. I add the note in README.

AliaksandrSiarohin / pose-gan

Training Time ? #5