Open khfs opened 1 week ago
Hi ! Have you verified your code without the chunkLlama
monkey patch?
Yes, I commented out line 85 in test_ppl.py and got the same error message. Later, after comparing this code with the eval.py code of LongLoRA, I found that changing np.uint32 to np.uint16 on line 98 of test_ppl.py allowed the code to run. However, the result of running CHUNKLLAMA2 7B was {"seq_len": 16384, "gpu": "1", "data_path": "pg19_llama2.validation.bin", "scale": "7b", "pretraining_length": 4096, "ppl": 1803.4413082318101}. Obviously, this result is not correct, and I don't know what other issues exist in the code.
Thank you for letting me know! I think this issue is caused by mistakenly uploading the files using Llama3 tokenizer. I will check it right now.
Hi! Changing data = {'val': np.memmap(data_path, dtype=np.uint32, mode='r')}
-> data = {'val': np.memmap(data_path, dtype=np.uint16, mode='r')}
works for me. Remember not to comment out this line: replace_with_chunkllama(args.pretraining_length, args.pretraining_length//4)
Although this allows the code to run, the result I obtained is {"seq_len": 16384, "gpu": "1", "data_path": "pg19_llama2.validation.bin", "scale": "7b", "pretraining_length": 4096, "ppl": 1803.4413082318101}, where the PPL is too high. Therefore, I believe there is still an issue with the code. I am curious about your results.
I have updated the code. Plz try the newest version🤣.
Thank you for your response regarding the validation on PG19 and I am currently testing the latest version of the code. How can I resolve the error related to the passkey retrieval?
test_passkey.py
has also been updated!
I followed the environment setup in the readme exactly. When performing Perplexity validation on PG19, the only difference from the original code is that I loaded the model from a local path and set the device to 'cpu' to see the exact error messages. My command line was:
python test_ppl.py --seq_len 16384 --scale 7b --data_path pg19_llama2.validation.bin
The terminal output was:
The argument
evaluate_ppl_all(seq_length=args.seq_len, sliding_window=256, args=args, model=model, data=data)
File "test_ppl.py", line 58, in evaluate_ppl_all
outputs = model(
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, *kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1027, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/functional.py", line 2264, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.88it/s] Test PPL on seq length 16384 0%| | 0/9446 [00:00<?, ?it/s] Traceback (most recent call last): File "test_ppl.py", line 102, inWhen performing Passkey Retrieval, the only difference from the original code is that I loaded the model from a local path. My command line was:
python test_passkey.py --seq_len 16384 --scale 7b
The terminal output was: Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.13it/s] Traceback (most recent call last): File "test_passkey.py", line 123, in
main(args)
File "test_passkey.py", line 77, in main
model = load_checkpoint_and_dispatch(model, checkpoint=model_path,
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/accelerate/big_modeling.py", line 607, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 1705, in load_checkpoint_in_model
raise ValueError(
ValueError: /data3/xylu/checkpoints/NousResearch/Llama-2-7b-hf containing more than one
.index.json
file, delete the irrelevant ones.