Congrats! It's a nice work for zero-shot captioning.
In the paper, zero-shot video captioning results on MSR-VTT, Activity-Net, etc. have been reported. But from the this repo, I couldn't find codes and pretraine models to perform such repreductions. I'd like to know whether these models and instructions on video caption will be relelased.
Congrats! It's a nice work for zero-shot captioning. In the paper, zero-shot video captioning results on MSR-VTT, Activity-Net, etc. have been reported. But from the this repo, I couldn't find codes and pretraine models to perform such repreductions. I'd like to know whether these models and instructions on video caption will be relelased.
Thanks a lot!