Open mzl9039 opened 2 years ago
try reducing your max sequence length (128, 64, etc. depending on your task) and increasing the dimension of the dense layer..
the size of word embeddings from language models are usually 512 or 768 depending on the variant - 64 dimensions is too small to encode a whole sentence/paragraph, and this constraint on representation will get worse the more homogenous your data is
good luck :)
Hi, I'm new to NLP and trying to pre-train a transformer, but the default dimension is high so I add a linear layer according to the demo below:
I use the TripletLoss to pretrain the sentence embedding.
but after pretrained, I found almost all the sentence embeddings like: [-0.99935, 0.99925, -0.99934, 0.99956, ...], all the entries are close to -1 or 1, obviously this is caused by nn.Tanh; and if I remove the dense layer from the model, the embeddings looks good.
Could you help explaining this?