Open riyajatar37003 opened 2 months ago
Hello!
The dropout already exists in the underlying transformers
model and is activated when the model is in train
mode. For example:
from sentence_transformers import SentenceTransformer, models
# Define your sentence transformer model using CLS pooling
model_name = "distilroberta-base"
transformer = models.Transformer(model_name)
pooling_model = models.Pooling(transformer.get_word_embedding_dimension(), pooling_mode="mean")
model = SentenceTransformer(modules=[transformer, pooling_model])
print(transformer.auto_model)
"""
RobertaModel(
(embeddings): RobertaEmbeddings(
(word_embeddings): Embedding(50265, 768, padding_idx=1)
(position_embeddings): Embedding(514, 768, padding_idx=1)
(token_type_embeddings): Embedding(1, 768)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): RobertaEncoder(
(layer): ModuleList(
(0-5): 6 x RobertaLayer(
(attention): RobertaAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): RobertaPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
"""
The model.encode
call ensures that the model is in eval mode, while the fit
method ensures that it's in train
mode. If I remove the self.eval()
call in model.encode
and then do:
tensor([[0.9942]])
You'll get more drastic changes if you increase the dropout, e.g. by updating the p
on all Dropout
classes. This is with 0.3:
tensor([[0.9729]])
Note that the SimCSE paper uses 0.1 dropout, which is exactly the default for many transformers
models already.
thank you so much.
i am trying to understand where exactly the dropout is applied to get two representation of same input text in this exampl https://github.com/UKPLab/sentence-transformers/blob/master/examples/unsupervised_learning/SimCSE/README.md
thanks