Closed repodiac closed 3 years ago
Hmm, your pretrained model does not have weights for ['lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias']
Did you set masked_language_modelling
to True
in the config? If so the model would have been loaded with AutoModelForMaskedLM
(see here) and I would have expected those weights to have been trained.
Still, maybe I am wrong and lm_head
is not used by your particular model. I think it is still worth evaluating this model you have trained and see if it performs well on your downstream tasks.
declutr.jsonnet
- besides the min/max length issue (see #235), thus masked_language_modelling
set to True
is the case (just checked)"model": {
"type": "declutr",
"text_field_embedder": {
"type": "mlm",
"token_embedders": {
"tokens": {
"type": "pretrained_transformer_mlm",
"model_name": transformer_model,
"masked_language_modeling": true
},
},
},
"loss": {
"type": "nt_xent",
"temperature": 0.05,
},
// There was a small bug in the original implementation that caused gradients derived from
// the contrastive loss to be scaled by 1/N, where N is the number of GPUs used during
// training. This has been fixed. To reproduce results from the paper, set this to false.
// Note that this will have no effect if you are not using distributed training with more
// than 1 GPU.
"scale_fix": false
},
However, as I wrote in https://github.com/JohnGiorgi/DeCLUTR/issues/118#issuecomment-927912463 in the continued/restarted runs I used the first model as from_archive
: Is that the problem?
"model": {
"type": "from_archive",
"archive_file": "/notebooks/DeCLUTR/output_bs32_ep10/model.tar.gz"
},
the underlying model according to huggingface seems to be XLMRobertaModel
- does it not use the referenced lm_head
hyperparameters in training? I doubt it...
Something has been trained for sure :) The embeddings are signficantly different to the base model (sentence-transformers/paraphrase-multilingual-mpnet-base-v2
) when used for semantic textual similarity, but I wonder if I miss out something here if the model "complains" in such a manner?
Any clarification is highly appreciated!
I think you are free to ignore these messages. I imagine this happens because somewhere during loading of the model, AutoModel.from_pretrained
is used, so the weights of lm_head
are not initialized, which is OK because we don't use them to produce sentence embeddings.
I have to admit that I am not particularly familiar enough with the underyling XLMRobertaModel
, but lm_head
sounds to me like the last hidden layer (in general, you put a task-specific header on top, e.g. softmax for classification tasks etc.) So for embeddings I would expect lm_head
to be used as last layer?
The example code you cited uses mean pooling on the token embeddings from the model's last transformer block. This doesn't require lm_head
.
Closing this, feel free to re-open if you are still having issues.
I trained an extension of model
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
(see #235).After training I used the script save_pretrained_hf.py in order to convey it to a HuggingFace Transformer-compatible format.
When I now run the example code for mean-pooling embeddings I get the following warning (
output_bs32_ep20_export
is my exported model):Any idea why this occurs? Is it true what the warning says or can I ignore it?