Custom Model Question on Dense Layer

Hi, I'm new to NLP and trying to pre-train a transformer, but the default dimension is high so I add a linear layer according to the demo below:

from sentence_transformers import SentenceTransformer, models
from torch import nn

word_embedding_model = models.Transformer('all-MiniLM-L6-v2, max_seq_length=256)
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
dense_model = models.Dense(in_features=pooling_model.get_sentence_embedding_dimension(), out_features=64, activation_function=nn.Tanh())

model = SentenceTransformer(modules=[word_embedding_model, pooling_model, dense_model])

I use the TripletLoss to pretrain the sentence embedding.

but after pretrained, I found almost all the sentence embeddings like: [-0.99935, 0.99925, -0.99934, 0.99956, ...], all the entries are close to -1 or 1, obviously this is caused by nn.Tanh; and if I remove the dense layer from the model, the embeddings looks good.

Could you help explaining this?

UKPLab / sentence-transformers

Custom Model Question on Dense Layer #1615