[Description]
This is published on author's own site, on 29 MAY 2017.
The author mainly went through what multi-task learning is and general practice.
In short, different similar tasks will share some common NN and hopefully to reduce network/parameter size and increase the information contained in different task data.
By sharing representations between related tasks, we can enable our model to generalize better on our original task. This approach is called Multi-Task Learning (MTL)
What should I share in my model?
Most approaches in the history of MTL have focused on the scenario where tasks are drawn from the same distribution (Baxter, 1997)While this scenario is beneficial for sharing, it does not always hold. In order to develop robust models for MTL, we thus have to be able to deal with unrelated or only loosely related tasks.
...
As mentioned initially, we are doing MTL as soon as we are optimizing more than one loss function. Rather than constraining our model to compress the knowledge of all tasks into the same parameter space, it is thus helpful to draw on the advances in MTL that we have discussed and enable our model to learn how the tasks should interact with each other.
Link: https://ruder.io/multi-task/
[Description] This is published on author's own site, on 29 MAY 2017. The author mainly went through what multi-task learning is and general practice. In short, different similar tasks will share some common NN and hopefully to reduce network/parameter size and increase the information contained in different task data. By sharing representations between related tasks, we can enable our model to generalize better on our original task. This approach is called Multi-Task Learning (MTL)
Two MTL methods for Deep Learning
Hard parameter sharing https://ruder.io/content/images/2017/05/mtl_images-001-2.png
Soft parameter sharing https://ruder.io/content/images/2017/05/mtl_images-002-1.png
Recent work on MTL for Deep Learning
What should I share in my model? Most approaches in the history of MTL have focused on the scenario where tasks are drawn from the same distribution (Baxter, 1997)While this scenario is beneficial for sharing, it does not always hold. In order to develop robust models for MTL, we thus have to be able to deal with unrelated or only loosely related tasks. ... As mentioned initially, we are doing MTL as soon as we are optimizing more than one loss function. Rather than constraining our model to compress the knowledge of all tasks into the same parameter space, it is thus helpful to draw on the advances in MTL that we have discussed and enable our model to learn how the tasks should interact with each other.
Different Tasks: