Thanks for your great work!
I', trying to ,modify the image training code to video captioning fine tuning, but there are somethings that doesn't quite clear to me how to modify like using "answer" parameter in MPLUG model.
Could you please release a train framework for this task?
I'm using vatex_video_caps_dataset class to load my dataset.
I think I've figured it out, I modified the dataset and the train call to pass the real captio as the "answer", is that the right way?
If so, I can create a pull request for you to add this.
Thanks for your great work! I', trying to ,modify the image training code to video captioning fine tuning, but there are somethings that doesn't quite clear to me how to modify like using "answer" parameter in MPLUG model. Could you please release a train framework for this task?
I'm using
vatex_video_caps_dataset
class to load my dataset.Thanks!