Fine tuning BERT to extract embeddings (like ELMo)

tjaffri commented 5 years ago

I'm looking to use BERT to create contextual embeddings of words in my documents. This is similar to ELMo as noted in the README.

My question is regarding fine-tuning. Reading the relevant section in the README it looks like fine tuning requires labeled data, e.g. the GLUE benchmark data. However, in the case of ELMo I am able to fine-tune simply using my unlabeled text corpus... the thought being that I want the model to be fine tuned on ways my specific text corpus is different (different word shapes, different technical jargon, etc.).

What's the best way to fine tune BERT in this way? One thought is to fine tune the same way the first step of pre-training is done, i.e. how the model was pre-trained on Wikipedia etc. The idea is that I could init the model using the weights from the pre-trained model and then continue training with the text from my corpus. Is that the right way? Any pointers on how to get started?

avostryakov commented 5 years ago

I think it is the right way. Please, see a section "Pre-training with BERT" in https://github.com/google-research/bert#pre-training-with-bert. Shortly speaking, there are two steps: preparing data for training and pretraining itself.

jacobdevlin-google commented 5 years ago

Yes, avostryakov is correct.

armandidandeh commented 5 years ago

@tjaffri I wonder where you landed. Fine-tuning by nature depends on your task. That is why it needs labelled data. This read helped shed some lights on how to generate the masked labelled data for me: https://arxiv.org/abs/1801.07736 Maybe it could help you or someone else in future who lands here.

KavyaGujjala commented 5 years ago

I'm looking to use BERT to create contextual embeddings of words in my documents. This is similar to ELMo as noted in the README.

My question is regarding fine-tuning. Reading the relevant section in the README it looks like fine tuning requires labeled data, e.g. the GLUE benchmark data. However, in the case of ELMo I am able to fine-tune simply using my unlabeled text corpus... the thought being that I want the model to be fine tuned on ways my specific text corpus is different (different word shapes, different technical jargon, etc.).

What's the best way to fine tune BERT in this way? One thought is to fine tune the same way the first step of pre-training is done, i.e. how the model was pre-trained on Wikipedia etc. The idea is that I could init the model using the weights from the pre-trained model and then continue training with the text from my corpus. Is that the right way? Any pointers on how to get started?

Hi , I am also working on same kind of problem. I want to train bert model on my domain specific data to get contextual embeddings. How to proceed for this? Because even the create_pretraining_data and run_pretraining scripts given in the bert website are for either sentence classification or next sentence prediction? I am kind of stuck over there! Can you please help me through this?

google-research / bert

Fine tuning BERT to extract embeddings (like ELMo) #145