Closed ganlumomo closed 3 years ago
Hi. Thanks for the interest. If you want to find a quick way of setting the weights. I would suggest to track the task-specific losses over a few iterations/epochs for every task. Then adjust the weights on the tasks a bit, so all of the losses are of similar magnitude. For example, if we solve two tasks, with weights 1.0, and the average loss of task 1 is 100 times the average loss of task 2. Then I would train with a weight * 100 for the second task. Also, if you use a multi-task baseline, I find that Adam can sometimes achieve a bit higher numbers compared to SGD.
Ofcourse, you might get even better results if you further try to adjust the weights through trial-and-error. However, I find that the proposed method works well in practice.
Good luck.
Hi @SimonVandenhende,
Thank you so much for the suggestions. For the Adam optimizer, does amsgrad parameter need to be set to True?
Best.
Hi @ganlumomo
I only explored with the regular Adam optimizer and SGD (see the get_optimizer function in utils/common_config.py). It should be possible to get good results when combining these optimizers with properly initialized weights (following the procedure outlined above).
Hi, I am very impressed by your survey on MTL, from which I have learned a lot. I am currently working on a MTL project, so I am very curious about the grid search experiments for the fixed weights. I have not found details about this in your paper as well as this repo. Could you give me more information on this? What exactly are those grid search weighs? And you used all the combinations of those weights to train the MTL network and evaluate it? If I want to find the best weights for my MTL network, I need to do the same experiments? Could you give me some suggestions on this? Thank you so much!