deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

About target QA task fine-tuning using AdaptiveModel #762

Closed gabinguo closed 2 years ago

gabinguo commented 3 years ago

Question

Hello, About the fine-tuning on the target QA task, do we need to re-instantiate the prediction head or not? For example, I have the model Roberta-base fine-tune on SQuAD (Roberta-base-SQuAD), and I need to fine-tune the model on my target QA task, for the prediction_head, should I load the head with SQuAD or just re-instantiate a new one?

Additional context

# Load the model and the head trained with SQuAD
model = AdaptiveModel.load(init_checkpoint, device=device)
model.connnect_heads_with_processor(data_silo.processor.tasks, require_labels=True)

or

# load the model and re-instantiate the head
language_model = LanguageModel.load(init_checkpoint)
prediction_head = QuestionAnsweringHead()
model = AdaptiveModel(...)
Timoeller commented 3 years ago

Very good question, our experience shows that especially for small datasets you should go for loading the full model with AdaptiveModel.load

If you QA dataset is decently large (5k+ QA pairs) you should also be fine with just initializing the model. We havent done test to see if initializing only the LM and not the prediction head helps in adjusting the model to out of domain data. Maybe you could try and report back here?

gabinguo commented 3 years ago

Thanks for the reply : )

Glad to know the effects for small datasets. Cuz I am trying to experiment with some small QA sets with ~2000 qa pairs.

We havent done test to see if initializing only the LM and not the prediction head helps in adjusting the model to out of domain data. Maybe you could try and report back here?

Sure, I will try to make one experiment about this.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.