OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch
https://opennmt.net/
MIT License
6.67k stars 2.24k forks source link

rolling ppl with sliding window #2553

Closed l-k-11235 closed 5 months ago

l-k-11235 commented 5 months ago

The wikitext2 perplexity calculation method is based on this Huggingface article:

It is calculated with a window size of max_seq_length = 4096 tokens. At each step, the window shifts by stride=512 tokens, and its first max_seq_length - stride. tokens are considered as context tokens. This means that their logits are not taken into account, allowing this rolling perplexity to be calculated without overlap.

I benchmarked llama2-7B with this config:

##############
# transforms #
##############
transforms: [sentencepiece]

###########
# Subword #
###########
src_subword_model: "llama/tokenizer.model"
tgt_subword_model: "llama/tokenizer.model"

#############
# Inference # 
#############

# GPU
world_size: 1
gpu_ranks: [0]
gpu: 0

seed: 42
max_length: 10
batch_type: sents
batch_size: 15

report_time: false
beam_size: 1
model: checkpoints/llama-2-7B_safetensors.pt
src: None

By running python3 run_wikitext-2_benchmark.py -config and