Video captioning inference error

Hello, I'm using your code from model_caption_mplug_vatex.py . When the inference drops the error. What could be the problem?

video = open_video('videos/mixkit-young-woman-running-with-mask-on-the-street-5352-small.mp4').reshape(-1, 8, 3, 448, 448) test_caption = [config['prompt'] + config['eos']] * video.size(0) test_caption = tokenizer( test_caption, padding='longest', truncation=True, max_length=25, return_tensors="pt" ).to('cuda')

with torch.no_grad(): topk_ids, topk_probs = model(video, test_caption, None, train=False, device='cuda')

RuntimeError: forward() expected at most 2 argument(s) but received 3 argument(s). Declaration: forward(torch.multimodal.model.multimodal_transformer.VisualTransformer self, Tensor input) -> Tensor

alibaba / AliceMind

Video captioning inference error #66