facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.35k stars 6.4k forks source link

It is possible to use BERT embeddings with Hierarchical Neural Story Generation model? #1244

Closed hc09141 closed 4 years ago

hc09141 commented 5 years ago

We're working on a language generation task where we have relatively little data available and have been using the "Hierarchical Neural Story Generation" command line tools (thanks, they're really great!). We'd really like to work with a vocabulary larger than what our dataset contains and wonder whether utilising a pre-trained BERT model for producing word embeddings might help with this. Do you have any recommendations for going about this?

huihuifan commented 5 years ago

hi, thanks for your interest. I'm glad you are finding the library useful. You would like to use BERT as embeddings to initialize your model, or you would like to be able to extend beyond the vocabulary of your dataset? For the latter, you could model subwords using BPE for example instead of full words. For using BERT, you could forward the pre-trained BERT model on the encoder side to produce contextual embeddings.

hc09141 commented 5 years ago

Thanks for your suggestions, I will look into using a pretrained BERT model to replace the embedding layer in FConvEncoder.

robbchu commented 4 years ago

Hi, Thanks for the amazing library. I think I have a similar issue. @huihuifan Do you mean that we could modify like TransformerEncoder to take the BERT encoder embeddings as the embeddings for itself?
Thank you!

huihuifan commented 4 years ago

Yes, or you could try to incorporate the BERT embeddings as additional input in addition to the embeddings you have

huihuifan commented 4 years ago

closing this for now!