FranxYao / Long-Context-Data-Engineering

Implementation of paper Data Engineering for Scaling Language Models to 128K Context
445 stars 29 forks source link

Error in Llama-3.2-3B needle evaluation #18

Open prakamya-mishra opened 3 weeks ago

prakamya-mishra commented 3 weeks ago

Hi @FranxYao if I use your code for evaluating Llama-3.2-3B model, specifically:

scaling_factor = 10 # hardcode
reset_rope(self.model_to_test, model_max_train_len=81920, scaling_factor=scaling_factor)

It throws the following error:

AttributeError: 'LlamaRotaryEmbedding' object has no attribute '_set_cos_sin_cache'

So if I comment this part out then I get the following results: Llama-3 2-3B

This is unexpected as Llama-3.2-3B model claims to support a context length up to 128K. Do you also get this erro? or how do you handle this?

What should be the the correct way to evaluate Llama-3.2-3B model?

I downloaded the llama model using:

from huggingface_hub import snapshot_download

snapshot_download(repo_id='meta-llama/Llama-3.2-3B',
                  local_dir='<path>/Llama-3.2-3B',
                  repo_type='model',
                  local_dir_use_symlinks=F

And run command is:

(
python -u needle_in_haystack.py --s_len 0 --e_len 128000\
    --model_provider LLaMA\
    --model_path <path>/Llama-3.2-3B
) 2>&1  | tee logs/Llama-3_2-3B.log