ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

What is the purpose of weights named transformer.h.N.attn.masked_bias? #35

Closed drunkinlove closed 3 years ago

drunkinlove commented 3 years ago

I'm trying to use the small GPT-3 checkpoint for finetuning using the DialoGPT repository, but it says the "transformer.h.N.attn.masked_bias" are unexpected. What layers do these weights refer to?

king-menin commented 3 years ago

Please give more info about your code and transformers version. this error can be detected if the version of the transformers is incorrect

drunkinlove commented 3 years ago

The DialoGPT repo uses some old version of the GPT2LMHeadModel (possibly custom): https://github.com/microsoft/DialoGPT/blob/master/lsp_model/modeling_gpt2.py#L72

These weights have a fixed value of -10000, what operation are they a part of? I'm able to finetune the model after I delete the weights, but I can't help wondering how much it'll affect the learned structure of the model

king-menin commented 3 years ago

We support only megatron and huggingface interfaces. For consistency use code in or repo