baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 506 forks source link

[BUG] CUDA Out of Memory when eval model. #133

Open Crystalxd opened 12 months ago

Crystalxd commented 12 months ago

Required prerequisites

System information

conda environment torch=2.0.1 transformers=4.29.2 ...

Problem description

I used A100(80G) to run the evaluate_zh.py script for evaluating baichuan model, but it occupied abundant GPU memory up to overflow. Then I found the model loaded without eval mode, meanwhile, it inferred without no_grad.

Reproducible example code

The Python snippets:

[https://github.com/baichuan-inc/Baichuan-7B/blob/6f3ef4633a90c2d8a3e0763d0dec1b8dc11588f5/evaluation/evaluate_zh.py#L97C13-L97C13](url)
self.model = model.eval()

https://github.com/baichuan-inc/Baichuan-7B/blob/6f3ef4633a90c2d8a3e0763d0dec1b8dc11588f5/evaluation/evaluate_zh.py#L103
Add on this line:
@torch.inference_mode()

Command lines:

Extra dependencies:

Steps to reproduce:

1. 2. 3.

Traceback

No response

Expected behavior

No response

Additional context

No response

Checklist

Guanze-Chen commented 10 months ago

Thank you. It works!!!

ICanFlyGFC commented 9 months ago

Thanks!

Guanze-Chen commented 9 months ago

您的邮件已经收到,会尽快回复您

Young-X commented 1 month ago

我在训练模型过程中,脚本默认使用gpu0,怎么调换到gpu1上面?

Guanze-Chen commented 1 month ago

您的邮件已经收到,会尽快回复您