facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.13k stars 6.36k forks source link

Data2Vec: missing lm_head in text model #4535

Open stefan-it opened 2 years ago

stefan-it commented 2 years ago

Hi,

after trying to convert an own pretrained model checkpoint into Transformers library with the following code snippet:

# LM Head
model.lm_head.dense.weight = data2vec_model.encoder.lm_head.dense.weight
model.lm_head.dense.bias = data2vec_model.encoder.lm_head.dense.bias
model.lm_head.layer_norm.weight = data2vec_model.encoder.lm_head.layer_norm.weight
model.lm_head.layer_norm.bias = data2vec_model.encoder.lm_head.layer_norm.bias
model.lm_head.decoder.weight = data2vec_model.encoder.lm_head.weight
model.lm_head.decoder.bias = data2vec_model.encoder.lm_head.bias

it seems that lm_head is missing in the encoder with the current implementation.

Here's a model diff between the original released text model and an own pretrained one:

Official text model:

  (lm_head): RobertaLMHead(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (layer_norm): FusedLayerNorm(torch.Size([768]), eps=1e-05, elementwise_affine=True)
  )

Pretrained one:

# Missing, but there's a regression head?
  (regression_head): Sequential(
    (0): Linear(in_features=768, out_features=1536, bias=True)                                                       
    (1): GELU(approximate=False)       
    (2): Linear(in_features=1536, out_features=768, bias=True)
  ) 

After looking at the Data2VecTextEncoder it seems that build_lm_head is never called:

https://github.com/facebookresearch/fairseq/blob/5307a0e078d7460003a86f4e2246d459d4706a1d/examples/data2vec/models/data2vec_text.py#L280-L323

However, here's the corresponding call in the RobertaEncoder:

https://github.com/facebookresearch/fairseq/blob/5307a0e078d7460003a86f4e2246d459d4706a1d/fairseq/models/roberta/model.py#L555-L564

My questions are now:

  1. is this a bug and the lm_head is really missing in the current implementation
  2. what is the main intention/function of the introduced regression_head (that does not exist in released text model)
  3. if it's not a bug, how can the Transformers conversion be fixed to get working#4534

Many thanks in advance!

stefan-it commented 2 years ago

Hi @alexeib , sorry for bothering you again, but could you a look at the missing lm_head. I could also share the pretrained checkpoint if necessary :hugs:

qwer4107 commented 2 years ago
스크린샷 2022-07-22 오후 3 32 53

Me either have that problem, too.

I am trying to convert own pretrained data2vec-text to Hugginface form, but it looks there are many parameter mismatches on code. (Because of AttributeError, i import the model(Data2vecTextModel) using data2vec-text in fairseq library, not in transformers library as suggested)

It seems many troubleshooting parts are included in 'convert_data2vec_text_original_pytorch_checkpoint_to_pytorch.py'

Anyway, i'd like to know how i transfer 'regression_head' and deal with 'lm_head' like @stefan-it asked