Description

This will output a list of commands, one per video, to run inference. run python build_inference_command.py -h for instructions on how to use this.

A few notes:

The README mentions the model GIT_BASE_MSRVTT_QA, but doesn't talk about a non-QA one. I'm not sure if simply removing the prefix (as mentioned in the readme) is enough to convert it to a captioning task or not.
'type': 'test_git_inference_single_image' this also seems suspect, but it's what the README says to use.

TSV files vs many many calls to the program

The README mentions that a TSV file can be used to designate multiple images for inference tasks. This feels like it would be the best solution rather than calling this program thousands of times but...

It's not clear training supports TSV files. I looked at the code a bit, and it appears like the answer might be no? This seems odd and impractical though
It's also not clear if a TSV file can be used for inference for video captioning... it seems like it talks about it in terms of single image captioning. I have not had a chance to look at this code
The TSV structure they describe would have us base64 encoding ALL of the images into tsv file... this would be huge... again, this seems impractical.

SanniM3 / video_summarisation_git

git Feature/frame list creation #1

Description

A few notes:

TSV files vs many many calls to the program