This repo try to implement Weakly Supervised Dense Video Captioning in tensorflow but not complete yet.
lexical_Res.py
for training FCN with MIML loss while saving weights with the lowest loss.region_selection.py
to generate most informative and coherrence region sequence.TRY3/model_seq2seq.py
to train language model.TRY3/s2vt_predict_v2.py
to inference the model.extract_frames.py
: Uniform sampling 30 frames for each video.load_data.py
: Create label vector and word dictionary.Res_video_bag.py
: Lexical FCN(Resnet50) with a frame as an instance.lexical_Res.py
: Lexical FCN(Resnet50) with a region as an instance.region_selection.py
: Region sequence generator, which cound form one region sequence now.extract_frames.py
.region_selection.py
.lexical_Res.py
.Res_video_bag.py
s2vt_train.py
: Language model using S2VT.(train)s2vt.py
: S2VT model graph.s2vt_inference.py
: Language model using S2VT.(inference)Shih-Chen Lin (dennis60512@gmail.com)
Any discussions and suggestions are welcome!