Add evaluation pipeline

bigscience-workshop / metadata

Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.

Apache License 2.0

30 stars 12 forks source link

Add evaluation pipeline #165

Closed ppommer closed 2 years ago

ppommer commented 2 years ago

This adds a script for evaluating our models on different kinds of metadata.

The steps include

Loading a model locally or from Hugging Face
Looping over the validation datasets and their examples
- regarding only examples with one entry
- skipping those exceeding 1024 tokens (maximum sequence length for GPT2)
- calculating the perplexity (weighted by the number of tokens)
Writing the results into an output file

Open TODO: Set the logits of the tokens indicating the start of local metadata to 0 (our model is disadvantaged compared to the simple baseline if it reserves probability mass for metadata).

@timoschick @cccntu

ppommer commented 2 years ago

@tianjianjiang, I think there are some package version upgrades required for the tests, could you help?

tianjianjiang commented 2 years ago

@tianjianjiang, I think there are some package version upgrades required for the tests, could you help?

@ppommer, Yes, I will do it in a couple of hours. That affected my plan to fix other bugs, too.

Update: @ppommer, it should work now once your branch is rebased/merged with the new master revision. Sorry for the inconvenience.

ppommer commented 2 years ago

@tianjianjiang, it's working, thanks!

cccntu commented 2 years ago

note: The following code can be used to load model from subfolder in the huggingface hub. This would be handy for people who runs the script. (no need to download model files before hand)

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, get_scheduler, set_seed

model = AutoModelForCausalLM.from_pretrained('bs-modeling-metadata/checkpoints_v0.4',
subfolder='checkpoint-10000step')