UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.37k stars 2.39k forks source link

SimCSE dropout parameter #2634

Open riyajatar37003 opened 2 months ago

riyajatar37003 commented 2 months ago

i am trying to understand where exactly the dropout is applied to get two representation of same input text in this exampl https://github.com/UKPLab/sentence-transformers/blob/master/examples/unsupervised_learning/SimCSE/README.md

thanks

tomaarsen commented 2 months ago

Hello!

The dropout already exists in the underlying transformers model and is activated when the model is in train mode. For example:

from sentence_transformers import SentenceTransformer, models

# Define your sentence transformer model using CLS pooling
model_name = "distilroberta-base"
transformer = models.Transformer(model_name)
pooling_model = models.Pooling(transformer.get_word_embedding_dimension(), pooling_mode="mean")
model = SentenceTransformer(modules=[transformer, pooling_model])

print(transformer.auto_model)
"""
RobertaModel(
  (embeddings): RobertaEmbeddings(
    (word_embeddings): Embedding(50265, 768, padding_idx=1)
    (position_embeddings): Embedding(514, 768, padding_idx=1)
    (token_type_embeddings): Embedding(1, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): RobertaEncoder(
    (layer): ModuleList(
      (0-5): 6 x RobertaLayer(
        (attention): RobertaAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): RobertaSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): RobertaIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
          (intermediate_act_fn): GELUActivation()
        )
        (output): RobertaOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
  )
  (pooler): RobertaPooler(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (activation): Tanh()
  )
)
"""

The model.encode call ensures that the model is in eval mode, while the fit method ensures that it's in train mode. If I remove the self.eval() call in model.encode and then do:

tensor([[0.9942]])

You'll get more drastic changes if you increase the dropout, e.g. by updating the p on all Dropout classes. This is with 0.3:

tensor([[0.9729]])

Note that the SimCSE paper uses 0.1 dropout, which is exactly the default for many transformers models already.

riyajatar37003 commented 2 months ago

thank you so much.