Multilingual Sentence & Image Embeddings with BERT
SimCSE dropout parameter #2634

riyajatar37003 commented 2 months ago

i am trying to understand where exactly the dropout is applied to get two representation of same input text in this exampl


tomaarsen commented 2 months ago


The dropout already exists in the underlying transformers model and is activated when the model is in train mode. For example:

from sentence_transformers import SentenceTransformer, models

# Define your sentence transformer model using CLS pooling
model_name = "distilroberta-base"
transformer = models.Transformer(model_name)
pooling_model = models.Pooling(transformer.get_word_embedding_dimension(), pooling_mode="mean")
model = SentenceTransformer(modules=[transformer, pooling_model])

  (embeddings): RobertaEmbeddings(
    (word_embeddings): Embedding(50265, 768, padding_idx=1)
    (position_embeddings): Embedding(514, 768, padding_idx=1)
    (token_type_embeddings): Embedding(1, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  (encoder): RobertaEncoder(
    (layer): ModuleList(
      (0-5): 6 x RobertaLayer(
        (attention): RobertaAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          (output): RobertaSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
        (intermediate): RobertaIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
          (intermediate_act_fn): GELUActivation()
        (output): RobertaOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
  (pooler): RobertaPooler(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (activation): Tanh()

The model.encode call ensures that the model is in eval mode, while the fit method ensures that it's in train mode. If I remove the self.eval() call in model.encode and then do:


You'll get more drastic changes if you increase the dropout, e.g. by updating the p on all Dropout classes. This is with 0.3:


Note that the SimCSE paper uses 0.1 dropout, which is exactly the default for many transformers models already.

riyajatar37003 commented 2 months ago

thank you so much.