JaywongWang / DenseVideoCaptioning

Official Tensorflow Implementation of the paper "Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning" in CVPR 2018, with code, model and prediction results.
MIT License
148 stars 50 forks source link

Same Sentence Generated for All Videos #24

Closed nayyeraafaq closed 5 years ago

nayyeraafaq commented 5 years ago

Hi Wang,

Thanks for sharing the code. I have couple of questions if you may ans one by one please?

(1) When we say batch_size = 1, isn't it mean no of batches = train_size ? then why it shows 1000 batches only while training ? (2) The model still shows no errors when i change batch_size = 32 (all places), where is the limitation ? (3) I keep getting the same sentence "a man is seeing standing to the camera" (for all videos) with little variations during training as well as test time in generated sentences ? (4) Why it does not compute BLEU, CIDEr and ROUGE scores whereas in the options I can see its enabled? (5) Can we use the model for different video feats length, for instance 2K ? At the moment it is 500. (6) Are you use pre-trained word embedding or it is being learnt with the model ?

Thanks

JaywongWang commented 5 years ago

(1) Check the opt.py 'metric_eval_num', it determines the number of items for evaluation. (2) The model.py explicitly manipulates only the first item from the input batch. (e.g., line 510) (3) Not sure the exact problem. It looks more like an optimization problem. Have you followed the same setting and used the same features? (4) Meteor is adopted in the official test server. It is possibly the most reliable metric for dense video captioning. (5) If you use the provided pre-trained model, the feature dimension must be the same. (6) NO. Only one-hot vector and the embedding is learnt during training.

nayyeraafaq commented 5 years ago

Thanks.