How can I pass bert embeddings to a bi-lstm layer

UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

https://www.SBERT.net

Apache License 2.0

14.73k stars 2.43k forks source link

How can I pass bert embeddings to a bi-lstm layer #834

Closed boscoj2008 closed 2 years ago

boscoj2008 commented 3 years ago

I have been trying to figure out how to build sentence embeddings by leveraging the bi-lstm layer. In the folder "average_word_embeddings" i use the example in training_stsbenchmark_bilstm.py and try to adapt it to train_nli.py by adding an LSTM layer, but get the error; File "../lib/python3.6/site-packages/sentence_transformers/models/LSTM.py", line 30, in forward sentence_lengths = torch.clamp(features['sentence_lengths'], min=1) KeyError: 'sentence_lengths'

Can anybody help? I'm stuck :(

nreimers commented 3 years ago

Hi, what you would need to do is to create a new layer class that extracts the sentence lengths from the BERT tokenizer / the BERT attention mask.

However, adding a LSTM on top of BERT does not make really sense. Transformers is a much more powerful model than bi-lstm. Adding one on top is just a waste of compute.

boscoj2008 commented 3 years ago

@nreimers Oh, thanks. In that case, could you advise me how I can solve my problem;

I have fine-tuned sentence-bert on SLNI and Multi-SNLI (ALL-LNI) data, load the trained model, encode my records to get vectors of 768 dimensions. I PCA the embeddings, cluster them and evaluate, however, the recall values are always below 45%. Precision is above 80% which is satisfactory. InferSent gives me superior (recall=70% or more, precision=85%) results but I was thinking to use bert's transformer architecture to give better embeddings.

Any advice will be appreciated. Thanks.

nreimers commented 3 years ago

Sorry, cannot help there.

But PCA is usually not needed when you run clustering.