Training Fails - Githubissues

bcmi / GracoNet-Object-Placement

Official code for ECCV2022 paper: Learning Object Placement via Dual-path Graph Completion

MIT License

73 stars 6 forks source link

Training Fails #7

Closed akif-caglar closed 1 year ago

akif-caglar commented 1 year ago

Hi, I try to run training code of GracoNet, main.py, on my local computer and want to see if I can get similar results to your pretrained weights. However, training seems to fail. In first epochs all objects are placed with 0 scale ratio, so they do not appear on the screen. And at last epochs like around 9, all objects are pasted with maximum scale to same location. I have not changed anything from main.py and model.py, only batch_size. Can you help me please ? Are you able to train new model well with the current state of the code which is on github ? Have you encountered some issue like this ? Thank you very much for your time and your work.

Siyuan-Zhou commented 1 year ago

@akif-caglar Hi, thanks for your question. Although I did not encounter your described issue before, I will train the model again with the current code on Github. If I encounter your described issue, I will try to find the potential bug.

akif-caglar commented 1 year ago

@Siyuan-Zhou Thank you so much. I am looking forward to hear from you.

akif-caglar commented 1 year ago

Hi, its me again. I wanted to give more details about the situation and ask for your opinion. These pictures are from samples. Around iteration 46500 generator starts to collapse on to a state in which it places either very large objects in same place mostly or places with 0 scale so object is not visible.

This collapse also can be seen from losses around iteration 45k. I though it is beacuse performance gap between discriminator and generator becomes too large on that point and from there generator starts to diverge. As a solution right now I am freezing weights of discriminator and also I have increased reconstruction loss's multiplier from 50 to 450 during training.

I would be happy to hear if you have any advice. Also, have you trained generator and discriminator at the same time during whole training without freezing weights of one?

Siyuan-Zhou commented 1 year ago

I have trained the model again with batch size 8 and I encountered the same problem as you mentioned. It seems that a larger batch size (like 32) may contribute to more reasonable generation results. Moreover, although the model falls into modal collapse in some epochs, you could also continue the training process. The model may behave well and go out of modal collapse automatically after several epochs. Last but not least, using evaluation metrics (instead of only focusing on losses) to examine the performances is more straightforward. I advise that you could try to print out the evaluation results of each epoch.

DavidXie03 commented 1 year ago

I am faced with the similar problem. Default parameters were used, but the final result was that the generator placed every foreground object in the left-bottom corner. 09151af09b5dfc9e1314557fe47308f 29679ebca34de615dbfcc0809f16b2f aad07f1de0893026a3e2e374c68242a f6f52a0c926a81cee5d3eb468cabe86

ustcnewly commented 1 year ago

The adversarial loss may make the training difficulty, you can try replacing adversarial loss with classification loss (using fixed SOPA as binary classification model). Additionally, I recommend FOPA https://github.com/bcmi/FOPA-Fast-Object-Placement-Assessment, which is more stable and effective.