Open sky505 opened 1 week ago
Hi, this could be caused by many things, to better understand your situation, we need more information:
Hi, this could be caused by many things, to better understand your situation, we need more information:
- which framework did you use? did you enable flash attention?
- can each one of the concurrent requests run individually without error?
- how did you implement concurrent inference?
`
try:
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
except Exception as e:
error_msg = f"{e}"
print(f"input异常 --> {error_msg}")
raise e
# Inference
try:
generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(f"qianwen模型输出结果 : {output_text}")
except Exception as e:
error_msg = f"{e}"
print(f"推理异常 --> {error_msg}")
raise e
out = ""
if isinstance(output_text, list):
if len(output_text) != 0:
out = output_text[0]
# 清空未使用的显存缓存
torch.cuda.empty_cache()
`
单个推理的情况下全部是正常的,就是并发推理出现异常
单个推理的情况下全部是正常的
If so, it is unlikely that you're facing an issue with the model.
Still, to check potential coding problems:
单个推理的情况下全部是正常的
If so, it is unlikely that you're facing an issue with the model.
Still, to check potential coding problems:
- which framework did you use? I see swift in the screenshot. if you're using swift, not just transformers, you may need to consider passing the issues to swift.
- how did you implement concurrent inference? the code you have shown is for a single request. I assume you didn't implement dynamic batching and use multi-threading?
1、如上我贴的代码就是使用的代码,也是报错"probability tensor contains either inf, nan or element < 0" 2、我贴的代码是个封装了推理,我在短时间内调用了5次(相同的视频)这个推理的方法,肯定会有4个报错,1个成功。但是我使用不同的视频,就是正常的 3、我是使用Flask框架实现的web服务,我只是简单的接受到视频地址和指令,传给推理的messages。
Provide MWE. Cannot reproduce using our own code with https://github.com/QwenLM/Qwen2-VL/blob/main/web_demo_mm.py.
同一个视频同时发起5次并发的推理,会一直报错:probability tensor contains either
inf
,nan
or element < 0 起初我本来以为是算力不足导致,但是我换成5个不一样的视频,推理,就正常。5个推理: 推理1:视频A + 图片A + 问题1 推理2:视频A + 图片B + 问题2 推理3:视频A + 图片C + 问题3 推理4:视频A + 图片D + 问题4 推理5:视频A + 图片E + 问题5