google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38k stars 9.58k forks source link

How to share BERT between tasks in multi-task setting? #504

Open svboeing opened 5 years ago

svboeing commented 5 years ago

Hello. I am trying to reproduce the BERT-based multi task learning model from this paper. I want to jointly fine-tune BERT on all tasks from GLUE dataset by sharing BERT between all tasks and adding task-specific layers and losses on top of it. Tasks are different and have different losses. I use run_classifier.py as a starting point. Here the model_fn function calls to create_model. Inside create_model BERT instance is created, loss is computed and passed to model_fn. Now, I want to process batches from different tasks through the same instance of BERT and then compute task-specific layers and losses using BERT's output tensor. Say, I only feed a batch from one random task at one training step. How can I make create_model compute different losses at different train steps, as i cannot dynamically route BERT's output to these different losses at run time? In other words, how can I share the same BERT model between different tasks without duplicating it? I would like to use the same Estimator API logic, as in original BERT implementation. Thank you very much.

AIGyan commented 5 years ago

You can define the loss in two ways. 1) Define loss function for two tasks and optimize them separately. 2) Define loss function for two tasks and optimize them jointly.

First way is suited when you want to perform alternate training and have batch of task 1 data and batch of task 2 data and you train them alternately. In it you alternately call the optimizer and optimize the network.

Second is more suited way if you want to perform the learning at same time. You simply add the loss and optimize this joint loss. This preserves separate task specific loss functions and performs training at same time. I wanted to train the network jointly at same time so I used this method.

total_loss = flower_type_loss + flower_color_loss train = optimizer.minimize(total_loss) Now instead of optimizing both loss separately you optimize one single joint loss. You define your optimizer function which is responsible for minimizing total_loss.

Source: https://medium.com/@kajalgupta/multi-task-learning-with-deep-neural-networks-7544f8b7b4e3

littlejiafan commented 5 years ago

I met the same problem and still don't know how to deal with it~ Have you solved this problem? Must share your way if you have. Thx a lot.

hsm207 commented 5 years ago

Maybe its easier if we use the pytorch implementation like here:

https://github.com/namisan/mt-dnn

mdmustafizurrahman commented 5 years ago

Following I also need to share the same BERT model for pairwise ranking loss implementation. Here is my issue: #761

Allysnow01 commented 4 years ago

Mark. The same question.

Pager07 commented 3 years ago

You can define the loss in two ways.

  1. Define loss function for two tasks and optimize them separately.
  2. Define loss function for two tasks and optimize them jointly.

First way is suited when you want to perform alternate training and have batch of task 1 data and batch of task 2 data and you train them alternately. In it you alternately call the optimizer and optimize the network.

Second is more suited way if you want to perform the learning at same time. You simply add the loss and optimize this joint loss. This preserves separate task specific loss functions and performs training at same time. I wanted to train the network jointly at same time so I used this method.

total_loss = flower_type_loss + flower_color_loss train = optimizer.minimize(total_loss) Now instead of optimizing both loss separately you optimize one single joint loss. You define your optimizer function which is responsible for minimizing total_loss.

Source: https://medium.com/@kajalgupta/multi-task-learning-with-deep-neural-networks-7544f8b7b4e3

https://jg8610.github.io/Multi-Task/