This gives the training command builder the ability to spit out a csv with one caption per video (with the -n flag).
This is needed to make inference on the test set work, since inference just uses the csv as a list of videos (as I understand it, inference ignores the captions in the csv, and pulls them from the json file?)
This gives the training command builder the ability to spit out a csv with one caption per video (with the
-n
flag).This is needed to make inference on the test set work, since inference just uses the csv as a list of videos (as I understand it, inference ignores the captions in the csv, and pulls them from the json file?)