Loading checkpoint shards: 100%|██████████████████████████████████████████████████████| 33/33 [00:12<00:00, 2.57it/s]
0it [00:02, ?it/s]
Traceback (most recent call last):
File "evaluate.py", line 225, in
fire.Fire(main)
File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(varargs, kwargs)
File "evaluate.py", line 187, in main
output, logit = evaluate(instructions, inputs)
File "evaluate.py", line 148, in evaluate
generation_output = model.generate(
File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 912, in generate
outputs = self.base_model.generate(kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 1532, in generate
return self.greedy_search(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 2356, in greedy_search
outputs = self(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 691, in forward
outputs = self.model(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 579, in forward
layer_outputs = decoder_layer(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 294, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 208, in forward
key_states = torch.cat([past_key_value[0], key_states], dim=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 66.00 MiB (GPU 0; 23.69 GiB total capacity; 20.74 GiB already allocated; 18.94 MiB free; 22.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Hello, I really appreciate your work! I finished running finetune_rec.py and a CUDA out of memory error occurred when I start to run evaluate.py. My bash file is below:
CUDA_ID=$1
output_dir=$2
model_path=$(ls -d $output_dir*)
base_model=llama-7b-hf
test_data=data/movie/test.json
for path in $model_path
do
echo $path
CUDA_VISIBLE_DEVICES=$CUDA_ID python evaluate.py \
--base_model $base_model \
--lora_weights $path \
--test_data_path $test_data \
--result_json_data $2.json
done
I'm not sure what is the problem. Hopefully you can give me a hand.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████| 33/33 [00:12<00:00, 2.57it/s] 0it [00:02, ?it/s] Traceback (most recent call last): File "evaluate.py", line 225, in
fire.Fire(main)
File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(varargs, kwargs)
File "evaluate.py", line 187, in main
output, logit = evaluate(instructions, inputs)
File "evaluate.py", line 148, in evaluate
generation_output = model.generate(
File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 912, in generate
outputs = self.base_model.generate(kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 1532, in generate
return self.greedy_search(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 2356, in greedy_search
outputs = self(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 691, in forward
outputs = self.model(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 579, in forward
layer_outputs = decoder_layer(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 294, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 208, in forward
key_states = torch.cat([past_key_value[0], key_states], dim=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 66.00 MiB (GPU 0; 23.69 GiB total capacity; 20.74 GiB already allocated; 18.94 MiB free; 22.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Hello, I really appreciate your work! I finished running finetune_rec.py and a CUDA out of memory error occurred when I start to run evaluate.py. My bash file is below:
CUDA_ID=$1 output_dir=$2 model_path=$(ls -d $output_dir*) base_model=llama-7b-hf test_data=data/movie/test.json for path in $model_path do echo $path CUDA_VISIBLE_DEVICES=$CUDA_ID python evaluate.py \ --base_model $base_model \ --lora_weights $path \ --test_data_path $test_data \ --result_json_data $2.json done
I'm not sure what is the problem. Hopefully you can give me a hand.