Closed ashokrajab closed 5 months ago
Yes this is possible - I will share models for this soon (~2 weeks), please stay tuned! Will update you here :)
Sorry it took me a bit longer but just released decoder LMs trained for embedding with bidirectional attention - this is essentially the v2 of sgpt:
Hope this is useful!
If I wanted to generate an embedding for a sentence using a decoder, should it necessarily follow a casual attention?
eg: This is a sample sentence.
lets say each word is a token. Now instead of sending in each token one by one and applying casual attention mask, to get the token embedding and then do a position-weighted mean pooling to get the sentence embedding...
Why can't we give the entire sentence all together and apply a self attention mask to get the sentence embedding?
I get that we are trying to stick to logic followed in the training process. But just wondering whether something like this should work.