SCLinDennis / Weakly-Supervised-Dense-Video-Captioning

This repo is not completed
7 stars 4 forks source link

Weakly-Supervised-Dense-Video-Captioning

This repo try to implement Weakly Supervised Dense Video Captioning in tensorflow but not complete yet.

Requirement

Usage

Guide

  1. extract_frames.py: Uniform sampling 30 frames for each video.
  2. load_data.py: Create label vector and word dictionary.
  3. Res_video_bag.py: Lexical FCN(Resnet50) with a frame as an instance.
  4. lexical_Res.py: Lexical FCN(Resnet50) with a region as an instance.
  5. region_selection.py: Region sequence generator, which cound form one region sequence now.
  6. dic/: Where to put ix2word, word2ix, word_counts.
  7. frames/: Where to put frames extracted by extract_frames.py.
  8. MSRVTT/: Where to put training/testing labels and region sequences generated by region_selection.py.
  9. videos/: Where to put the MSR-VTT videos.
  10. Weight_Resnet50/: Where to put weight save from lexical_Res.py.
  11. Weight_Resnet50_vasbag/: Where to put weight save from Res_video_bag.py
  12. TRY3/s2vt_train.py: Language model using S2VT.(train)
  13. TRY3/s2vt.py: S2VT model graph.
  14. TRY3/s2vt_inference.py: Language model using S2VT.(inference)

Reference

Contact

Shih-Chen Lin (dennis60512@gmail.com)

Any discussions and suggestions are welcome!