Question about Training from scratch.

toke1220 commented 2 years ago

Hello, @djiajunustc Thank you for your excellent work! I do TransVG training in a GPU(TITan V 12G). I trained both models at the same time. One is the source code in this repository, unmodified. The other is to make some personal improvements. BUT, for these two models, when it was 30 epochs, loss fluctuated around 0.65 with no obvious convergence trend, and val_acc was about 69%. At the time of 40 epochs, both models began to show significant increase in loss and decrease in val_acc. I wonder if the difference between you training with 8 GPUs and i training with one GPU will cause this problem. In addition, could you please provide your log file so that I can refer to it for my work? This is my email wuxuming96@foxmail.com. Thank you again for your work!

jianghaojun commented 2 years ago

Did you set smaller learning rate? Since you train the model on one GPU, the batchsize is only 1/8 of the original. Thus, the original learning is too large for your training setting.

toke1220 commented 2 years ago

Did you set smaller learning rate? Since you train the model on one GPU, the batchsize is only 1/8 of the original. Thus, the original learning is too large for your training setting.

Thanks for your suggestion. I will try it.

preetom-saha-arko commented 1 year ago

@toke1220 Did lowering the learning rate solve your problem?

djiajunustc / TransVG

Question about Training from scratch. #23