PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.1k stars 2.94k forks source link

[Question]: mbart多对多模型下载后有5G,我在A40 显存44G上面跑不了吗?运行报内存溢出 #3334

Closed Amy234543 closed 1 year ago

Amy234543 commented 2 years ago

请提出你的问题

place = "gpu" paddle.set_device(place) model_name = "mbart-large-50-many-to-many-mmt" tokenizer = MBartTokenizer.from_pretrained(model_name) model = MBartForConditionalGeneration.from_pretrained(model_name, src_lang="en_XX") model.eval() def postprocess_response(seq, bos_idx, eos_idx): """Post-process the decoded sequence.""" eos_pos = len(seq) - 1 for i, idx in enumerate(seq): if idx == eos_idx: eos_pos = i break seq = [ idx for idx in seq[:eos_pos + 1] if idx != bos_idx and idx != eos_idx ] res = tokenizer.convert_ids_to_string(seq) return res bos_id = tokenizer.lang_code_to_id["zh_CN"] eos_id = model.mbart.config["eos_token_id"]

inputs = "PaddleNLP is a powerful NLP library with Awesome pre-trained models and easy-to-use interface," input_ids = tokenizer(inputs)["input_ids"] input_ids = paddle.to_tensor(input_ids, dtype='int64').unsqueeze(0) with paddle.nograd(): outputs, = model.generate(input_ids=input_ids, forced_bos_token_id=bos_id,max_length=50,use_faster=False, use_fp16_decoding=False, )

result = postprocess_response(outputs[0].numpy().tolist(), bos_id, eos_id)

print("Model input:", inputs) print("Result:", result)

这是我测试的代码,结果报错 4787b13f7907af17dd63bc9b20da738

gongel commented 2 years ago

你好,我这边在V100-32G上可以运行,显存占用为12G。你的GPU上是不是有别的程序占着显存呢?PaddlePaddle版本是 2.3.1.post101

FrostML commented 2 years ago

确实,EB 级别的显存占用不合理。判断是某些数值获取异常导致出现极大,个别 op 计算占用显存激增。

可以定位下 这里 是否是执行到这里就发生报错。 或者直接使用

import paddle
a = paddle.to_tensor([0], dtype="int32")
paddle.arange(a, a+1, dtype="int64")

是否会出现问题。

若是,怀疑可能是环境问题,麻烦再提供下机器环境配置。

也可尝试修改 这里 代码为

decoder_inputs_embed_pos = self.decoder_embed_positions(
            decoder_input_ids.shape, past_key_values_length.cuda())

看看是否能解决。

Amy234543 commented 2 years ago

import paddle a = paddle.to_tensor([0], dtype="int32") paddle.arange(a, a+1, dtype="int64") 执行以上代码会报错, 环境
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 cuda 11.7 cudnn 8.4 paddlepaddle-gpu 2.3.2 paddlenlp git clone 下载的最新的 @FrostML

paddle安装是用的pip,我看了下文档cuda 11.7 只能docker安装吗

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。