Open SherrySwift opened 7 months ago
Hi, Thanks for your question. Did you use Llama-2-7b? The model used in the paper is "huggyllama/llama-7b".
Hi, I used huggyllama/llama-7b, but I encounterd the following errors when I try to run scripts/summarization/eval.sh:
Traceback (most recent call last):
File "/data1/H2O-main/h2o_hf/run_summarization.py", line 138, in <module>
output_sequences = model.generate(
File "/usr/local/miniconda3/envs/atom/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data1/LLM/transformers/src/transformers/generation/utils.py", line 1719, in generate
return self.sample(
File "/data1/LLM/transformers/src/transformers/generation/utils.py", line 2837, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
when I load other models like Llama-2-7b, there won't be such an error. Do you have any ideas about it? Thanks a lot !
Hi, could you provide the detailed command and tranformers version you used? I didn't reproduce the same issue on my side when using huggyllama/llama-7b.
Thanks for your reply.
Here is the command:
bash scripts/summarization/eval.sh xsum 5 full 0
The contents in scripts/summarization/eval.sh are:
task=$1
shots=$2
method=$3
GPU=$4
HH_SIZE=$5
RECENT_SIZE=$6
if [[ ${method} == 'h2o' ]]; then
CUDA_VISIBLE_DEVICES=${GPU} python -u run_summarization.py \
--input_path data/summarization_data/${task}_${shots}shot.jsonl \
--output_path summary_results/${task}_${shots}shot_h2o_hh${1}_local${2}.jsonl \
--model_name huggyllama/llama-7b
--hh_size ${HH_SIZE} \
--recent_size ${RECENT_SIZE} \
--cache_dir ../../llm_weights \
--enable_h2o_cache
elif [[ ${method} == 'full' ]]; then
CUDA_VISIBLE_DEVICES=${GPU} python -u run_summarization.py \
--input_path data/summarization_data/${task}_${shots}shot.jsonl \
--output_path summary_results/${task}_${shots}shot_full.jsonl \
--model_name huggyllama/llama-7b
else
echo 'unknown argment for method'
fi
As for tranformers version, I tried both 4.33.0 and 4.35.0, and I encounter the same problem.
by the way, the above error also occur in the middle of evaluation when I use other models (such as llama-2-7b) Here is part of the log:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
rouge-1: 0.310912, rouge-2: 0.118365, rouge-l: 0.260621
80%|███████████████████████████████████████████████████████████████████▋ | 796/1000 [1:12:14<18:08, 5.33s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
rouge-1: 0.310952, rouge-2: 0.118289, rouge-l: 0.260724
80%|███████████████████████████████████████████████████████████████████▋ | 797/1000 [1:12:19<18:08, 5.36s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
80%|███████████████████████████████████████████████████████████████████▋ | 797/1000 [1:12:23<18:26, 5.45s/it]
Traceback (most recent call last):
File "/data1/H2O-main/h2o_hf/run_summarization.py", line 137, in <module>
output_sequences = model.generate(
File "/usr/local/miniconda3/envs/atom/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data1/LLM/transformers/src/transformers/generation/utils.py", line 1719, in generate
return self.sample(
File "/data1/LLM/transformers/src/transformers/generation/utils.py", line 2837, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
When seek for solutions, I found this issue. Is it possible that this error is related to beam sample that used in generation process?
Hi, I tested the samples from 795 to 800, but didn't encounter the same error.
Based on your error information, could you try to specify "pad_token_id=tokenizer.eos_token_id" in the model.generate() function.
Thanks for your patience, but specify "tokenizer.pad_token_id=tokenizer.eos_token_id" still cannot solve the problem. Since I couldn't come up with a better solution, I just skip the sample 797 in the end.
Also, I notice that you set 'temperature=0.3, top_p=1, do_sample=True' in model.generate() function in h2o_hf/run_summarization.py, is there any particular reason for these parameter settings? Just wonder about it.
Hi, I followed the original HELM for these parameters. Generally, large temperature will bring more diversity and less deterministic.
Sorry to bother you again. In h2o_hf/data directory, there are several different jsonl files for xsum dataset. In order to reproduce the result in Figure 4 in paper (i.e. Rouge-2 is 12 for llama-7b), which jsonl file should I use? I notice that the content between xsum_5shot.jsonl and xsum.jsonl are quite different. So got a liittle bit confused about that.
Hi everyone, I have another question regarding reproducing XSUM results. In h2o_hf/scripts/summarization/eval.sh, it sets a fixed HH_SIZE and RECENT_SIZE, but the x-axis of figure 4 represents KV Cache Budget (%), so what is the relationship between size and percentage? The total number of tokens varies with each sample, right?
Hi, I used huggyllama/llama-7b, but I encounterd the following errors when I try to run scripts/summarization/eval.sh:
Traceback (most recent call last): File "/data1/H2O-main/h2o_hf/run_summarization.py", line 138, in <module> output_sequences = model.generate( File "/usr/local/miniconda3/envs/atom/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data1/LLM/transformers/src/transformers/generation/utils.py", line 1719, in generate return self.sample( File "/data1/LLM/transformers/src/transformers/generation/utils.py", line 2837, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
when I load other models like Llama-2-7b, there won't be such an error. Do you have any ideas about it? Thanks a lot !
I use Llama-2-7b but I still get this error, I use float16. And I check this piece of data, the prompt has 6768 tokens so I guess this is because prompt length is too long so the model collapse
Thanks for your patience, but specify "tokenizer.pad_token_id=tokenizer.eos_token_id" still cannot solve the problem. Since I couldn't come up with a better solution, I just skip the sample 797 in the end.
Also, I notice that you set 'temperature=0.3, top_p=1, do_sample=True' in model.generate() function in h2o_hf/run_summarization.py, is there any particular reason for these parameter settings? Just wonder about it.
Hi, I have also met the same bug when the generation process comes to 797/1000
:
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
So I try to test the sample 797 by modifying line #117 as
requests = requests[795:]
As expected, the bug occurs at 2/205
again.
So I go to check the dataset, i.e., sum_5shot.jsonl, and find this sample is marked as
Tokenization is skipped for long lines for performance reasons. This can be configured via editor.maxTokenizationLineLength.
Obviously, the reason for the model collapse is that the prompt length is too long.
Hi everyone, I have another question regarding reproducing XSUM results. In h2o_hf/scripts/summarization/eval.sh, it sets a fixed HH_SIZE and RECENT_SIZE, but the x-axis of figure 4 represents KV Cache Budget (%), so what is the relationship between size and percentage? The total number of tokens varies with each sample, right?
Hi, thanks for your great works! I have some questions about the reproduction of XSUM results. I tried to run this command in h2o_hf dir:
# Full baseline on XSUM shots=5 GPU-ID=0 bash scripts/summarization/eval.sh xsum ${shots} full ${GPU-ID}
I tested on all 1000 samples in xsum_5shot.jsonl, using LLaMA-7B model, but the ROUGE-2 result that I got is only about 9% According to Figure 4 in paper, the full baseline of XSUM, LLaMA-7B is 12% Can't figure out the reason about it. Would you please give me some advice? Thanks a lot!
Hi, I also used huggyllama/llama-7b to run the XSUM task, and got the same conclusion as yours:
rouge-1: 0.267594, rouge-2: 0.098886, rouge-l: 0.222643
Do you have any ideas about this?
Hi everyone, I have another question regarding reproducing XSUM results. In h2o_hf/scripts/summarization/eval.sh, it sets a fixed HH_SIZE and RECENT_SIZE, but the x-axis of figure 4 represents KV Cache Budget (%), so what is the relationship between size and percentage? The total number of tokens varies with each sample, right?
Hi, I also want to know this question. Do you have any ideas?
Hi, thanks for your great works! I have some questions about the reproduction of XSUM results. I tried to run this command in h2o_hf dir:
I tested on all 1000 samples in xsum_5shot.jsonl, using LLaMA-7B model, but the ROUGE-2 result that I got is only about 9% According to Figure 4 in paper, the full baseline of XSUM, LLaMA-7B is 12% Can't figure out the reason about it. Would you please give me some advice? Thanks a lot!