allenai / unifew

Unifew: Unified Fewshot Learning Model
Apache License 2.0
18 stars 3 forks source link

Embedding size of the "allenai/unifiedqa-t5-large" model from the meta-trained checkpoint? #3

Closed MicPie closed 2 years ago

MicPie commented 2 years ago

Hi unifew and flex authors,

I'm currently setting up a fine-tuning pipeline for the Huggingface "allenai/unifiedqa-t5-large" model that I want to run through the flex evaluation with the supplied test.py setup outlined in https://github.com/allenai/unifew#meta-testing-on-flex .

For my fine-tuning I'm using a HF Trainer setup to obtain my pretrained weights. Then I use your supplied checkpoint and replace only the state_dict with my weights in order to use the test.py setup (as it needs the additional keys in the checkpoint).

However, my weight state dict based on the HF "allenai/unifiedqa-t5-large" differs in the shapes in following parameters:

model.shared.weight
model.encoder.embed_tokens.weight
model.decoder.embed_tokens.weight
model.lm_head.weight

The checkpoint from https://github.com/allenai/unifew#meta-trained-checkpoint has a shape of [32128, 1024] for the above parameters, whereas my HF model setup has a shape of [32100, 1024]. So 28 tokens less - do you know what the reason for that can be? Maybe some special tokens that where added? Do you know how I can best solve that?

Also to be sure: "Meta-testing" from this section here https://github.com/allenai/unifew#meta-testing-on-flex is only referring to testing without meta-training, or am I wrong here?

I'm looking forward to your feedback, as we want to include the flex evaluation in an upcoming publication. :-)

Thank you & kind regards Michael

armancohan commented 2 years ago

Hi Michael,

I can't seem to reproduce this. Looks like the HF checkpoint also has the same position embedding size as the unifew-meta checkpoint:

In [1]: from transformers import AutoModelForSeq2SeqLM

In [2]: model = AutoModelForSeq2SeqLM.from_pretrained('allenai/unifiedqa-t5-large')

In [3]: assert model.shared.weight.shape == model.encoder.embed_tokens.weight.shape == model.decoder.embed_tokens.weight.shape == model.lm_head.weight.shape

In [4]: model.shared.weight.shape
Out[4]: torch.Size([32128, 1024])

# download checkpoint: `wget https://fleet-public.s3.us-west-2.amazonaws.com/unifew-meta-trained.ckpt`
# load the meta-trained checkpoint
In [5]: meta_trained_ckpt = torch.load('unifew-meta-trained.ckpt', map_location='cpu')

In [6]: meta_trained_ckpt['state_dict']['model.encoder.embed_tokens.weight'].shape
Out[6]: torch.Size([32128, 1024])

Can this be an environment issue?

Regarding this question:

Also to be sure: "Meta-testing" from this section here https://github.com/allenai/unifew#meta-testing-on-flex is only referring to testing without meta-training, or am I wrong here?

You can "Meta-test" a model that you have previously "meta-trained". You can simply provide the additional argument model.ckpt_path=/full/path/to/checkpoint.ckpt to the test.py script, as explained in the readme. We use the "meta-test" terminology according to prior work, but you can just treat it as normal testing.

Let us know if you have more questions!

MicPie commented 2 years ago

Hi Arman, thank you very much for your feedback! :-) I found my "bug": I was using a HF training script that resized the embeddings and I was somehow looking over that part when I was checking the code multiple times. 🤦 Thank you for your help.