facebookresearch / grounded-video-description

Video Grounding and Captioning
Other
323 stars 72 forks source link

Pre-trained models using Transformer for unsupervised mode? #13

Closed leobxpan closed 5 years ago

leobxpan commented 5 years ago

Hi, is there any pre-trained model of transformer under unsupervised mode available? Thanks.

LuoweiZhou commented 5 years ago

@CaesarPan We do not have any pre-trained model on Masked Transformer as this part of the code is for reference only. The implementation here is a variant of the original model as i) our Transformer encoder is applied on region features, ii) our Transformer decoder takes the last hidden state from Transformer encoder and sometimes the frame-wise video feature, while in Masked Transformer, hidden states from all Transformer encoders (on top of frame-wise features) are feed into the Transformer decoder. For pre-trained Transformer models on captioning, you can refer to this new repo: https://github.com/LuoweiZhou/VLP