How to actually use the fine tuned model?

google-research / bert

TensorFlow code and pre-trained models for BERT

https://arxiv.org/abs/1810.04805

Apache License 2.0

37.87k stars 9.56k forks source link

How to actually use the fine tuned model? #339

Open CapitalZe opened 5 years ago

CapitalZe commented 5 years ago

I have successfully fine tuned BERT for NER with RCV1 dataset, after modifying run_classifier.py and some other script.

The results are satisfactory for the time being.

Anyway, I am fairly new to all of this and am completely stumped as to how I use the now fine-tuned and trained model I have.

How would I feed in text for the BiLTSM to predict next sentence for NER? Would it require a modification of run_squad.py? I can not find anything and any pointers or guides would be extremely helpful and highly appreciated.

Many thanks!

pfecht commented 5 years ago

BERT, as the name suggests, is encoder-only. The model's output can't be finetuned with a simple linear layer. Even though BERT itself is trained with a next word prediction task, it is not made for language modeling tasks out of the box and there are, as far as I know, no implementation or descriptions available to address this problem. Please correct me if I'm wrong.

It also doesn't seem very intuitive to use an RNN decoder with a Transformer encoder model. It may be worth trying to build some kind of own encoder-decoder attention layer. However, I'd be grateful for any input as well.

hanxiao commented 5 years ago

You can also use bert-as-service to extract features using a fine-tuned model, see https://github.com/hanxiao/bert-as-service/#serving-a-fine-tuned-bert-model

CapitalZe commented 5 years ago

Thanks! I will look into this now, seems like a great resource and what I was looking for.

macanv commented 5 years ago

you can add crf output layer for sequence labeling using bert last output embedding. https://github.com/macanv/BERT-BiLSTM-CRF-NER something wrong please tell me.

davidswelt commented 5 years ago

Is there some example source code that demonstrates how to use tf.data.Dataset to pipe data into a BERT language model?

JimAva commented 5 years ago

What is the best solution to use a fine-tuned model in production and return predictions? Bert-as-service only provides the sentence encoding but does not allow for returning predictions, unless I missed that part.