generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
显示torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.18 GiB (GPU 0; 79.35 GiB total capacity; 46.59 GiB already allocated; 11.25 GiB free; 66.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
### Expected behavior / 期待表现
输入长度为30k左右,已经用两张卡加载模型了,还是会显示out-of-memory
请求一下帮助~谢谢~
System Info / 系統信息
cuda 11.7 transformes 4.37.2 python3.10
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
1.GPU为 DGX-A800-80G 2.export CUDA_VISIBLE_DEVICES=1,2
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(input_ids, generated_ids)] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response)