Warnings like "Some weights of the model checkpoint at xlm-roberta-large were not used when initializing XLMRobertaModel"

nangying1112 commented 2 years ago

❓ Questions and Help

What is your question?

When I run "comet-score -s test.en-zh.en -t decoder-out -r test.en-zh.zh", I got the following warnings. Is that normal? or am I missing something?

/root/.cache/torch/unbabel_comet/wmt20-comet-da//checkpoints/model.ckpt Some weights of the model checkpoint at xlm-roberta-large were not used when initializing XLMRobertaModel: ['lm_head.bias', 'roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight'] This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Encoder model frozen. /usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/container.py:435: UserWarning: Setting attributes on ParameterList is not supported. warnings.warn("Setting attributes on ParameterList is not supported.") GPU available: True, used: True

What's your environment?

Linux
python 3.8
Version

ricardorei commented 2 years ago

I'll see if I can fix the warning but you should not worry because after we load the XLMR from hugging face we reload the final weights from our final checkpoints.

To be fair this is a bit inefficient as we are loading XLMR twice, but what counts in the end are the weights from the checkpoint downloaded from the list of available models.

nangying1112 commented 2 years ago

Thanks for the reply! I want to ask one more question. When evaluating on English->Chinese, should the source/hypothesis/reference file be tokenized or detokenized?

ricardorei commented 2 years ago

The text should always be detokenized. We run our own tokenization (in most cases XLM-R tokenizer)

ricardorei commented 2 years ago

Btw as a sanity check I just tried:

from transformers import XLMRobertaModel
model = XLMRobertaModel.from_pretrained("xlm-roberta-large")

And the warnings are exactly the same. This is the expected behaviour

ricardorei commented 2 years ago

This is because XLM-R in hugging face is an XLMRobertaForMaskedLM model class. I don't care about the prediction head in COMET and I initialize instead a plain XLMRobertaModel (the base architecture without the LM head)

nangying1112 commented 2 years ago

Great! That helps a lot!

Unbabel / COMET