Open AlphaNext opened 4 months ago
我也遇到过这个问题,这个是因为inference.py没传attention mask和pad token id。参考videollama2/__init__.py
里面的infer函数可以改成:
input_ids = tokenizer_MMODAL_token(prompt, tokenizer, modal_index, return_tensors='pt').unsqueeze(0).cuda()
attention_masks = input_ids.ne(tokenizer.pad_token_id).long().cuda()
# 3. generate response according to visual signals and prompts.
stop_str = conv.sep if conv.sep_style in [SeparatorStyle.SINGLE] else conv.sep2
# keywords = ["<s>", "</s>"]
keywords = [stop_str]
stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)
do_sample=True
with torch.inference_mode():
output_ids = model.generate(
input_ids,
attention_mask=attention_masks,
images_or_videos=tensor,
modal_list=modals,
do_sample=do_sample,
temperature=0.2 if do_sample else 0.0,
max_new_tokens=1024,
use_cache=True,
stopping_criteria=[stopping_criteria],
pad_token_id=tokenizer.eos_token_id,
)
我自己这边测试是有效的
前向使用的代码如下,只改了questions和视频路径paths,具体的questions是:"Generate a brief and accurate for this video"
频繁输出的可疑日志如下: