Best way to do Domain Adaptation

lematmat commented 3 years ago

Hi,

As I don't have any labeled dataset, I'm wondering what is the best way to adapt NLI and Quora to my domain application (Legal Law) :

only fine-tuning Bert on my specific corpus and then use NLI and quora model as they are to label my dataset
no fine-tuning of Bert and then modeling with my augment dataset (NLI-custom or NLI/Quora_custom)
Lastly, if I label my dataset using Quora Model, is there any interest to I append my new labeled dataset to Quora ones ? I hope I'm not to unclear. Best Regards, lematmat

nreimers commented 3 years ago

If you have sufficient number of legal documents, you can continue to pre-train BERT on it using Masked Language Model.

Then, you can fine-tune this model on the labeled data you have (or use e.g. Quora data if your task is similar).

lematmat commented 3 years ago

Ok, thank you very much nreimers.

UKPLab / sentence-transformers