KaiyangZhou / pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)
MIT License
467 stars 150 forks source link

Find features, change points, num_frames and positions for custom test video #21

Open mayank26saxena opened 5 years ago

mayank26saxena commented 5 years ago

Hi @KaiyangZhou,

I wanted to know how I can find the following features to generate a summary for a custom video:

Please let me know!

SinDongHwan commented 5 years ago

if you've solved, i want to know how to use KTS.

hungbie commented 5 years ago

Hi,

For change point detection. What should i input to KTS? Flatten image as HxW dimension input or using some feature extraction methods so that the image become some N dimension input? What is being used in this paper to preprocess the image/frame?

SinDongHwan commented 5 years ago

@hungbie Hi, you should input features of each frame.
You can use KTS in "utils/generate_dataset.py" at this

hungbie commented 5 years ago

@SinDongHwan Thank you! I will take a look!

hungbie commented 5 years ago

@hungbie Hi, you should input features of each frame. You can use KTS in "utils/generate_dataset.py" at this

I understand your approach. I have tried and come to the same thing using features from Googlenet or Resnet. However, I think in original KTS paper, the author used SIFT + Fisher vector to generate the descriptors. Have you tried this method?

SinDongHwan commented 5 years ago

I understand your approach. I have tried and come to the same thing using features from Googlenet or Resnet. However, I think in original KTS paper, the author used SIFT + Fisher vector to generate the descriptors. Have you tried this method?

yes, i tried to use SIFT+Fisher vector. but i gave up. I know that SIFT+Fisher vector method is to generate features. and i thought that author used SIFT+Fisher vector, because cnn had not existed.

hungbie commented 5 years ago

ok. I will try but since SIFT is patented anyway it's good to look into other methods. Thank you!

SinDongHwan commented 5 years ago

@hungbie Okay!! Good Luck^^

Harryjun commented 4 years ago

@hungbie Hi, you should input features of each frame. You can use KTS in "utils/generate_dataset.py" at this

Hi @SinDongHwan I use your code "generate_dataset.py" then,I fond the size of feature is 2048? what can i do

SinDongHwan commented 4 years ago

@Harryjun

I sent email, now!! but i will write here, too. for people having same question .

Hi, Harryjun~!!train dataset is generated using GoogleNet.but, my codes extract features using ResNet. so size of feature is 2048. i had tested and then i've gotten following result. 1) I had extracted features using GoogleNet, and i've gotten change points. 2) GoogleNet(generated by me) was worse  then open dataset(TVSum, SumMe, etc).so i tried using ResNet. and i got better results than GoogleNet(generated by me).but ResNet is more deep than GoogleNet. so it is a little bit slow to generate feature and get change points.but if you using batches, you can get it faster. if you want to 1024 features using ResNet.I think if input size of ResNet is (112,112), will extract features 1024. just my guess.^^

Good Luck!!

Harryjun commented 4 years ago

@SinDongHwan Hi,I find we make the h5 file so slowly,like this image it process one frame / second could you have this problem,and give me some suggestion?

SinDongHwan commented 4 years ago

@Harryjun Hi, i can tell you exactly, if i have to see your situation. i guess main memory is lack. if out of memory, it is slow to swap in/out data. so, it will be slow for you to make h5 file, too.

when you execute code, Can you check your memory? If it's right, you can try two methods. (just my ideas ^^)

1st method,

"split codes extracting all features for getting change points, 15th features for training dataset."

first, you have to extract all features, and then get change points.
second, you extracts 15th features for training dataset.

2nd method

"extract all features, and then select 15th frames."

first, you have to extract all features, and then get change points. second. you select 15th features from all features for training dataset.

Do you use hangout? i want to see your situation on teamviewer.

Harryjun commented 4 years ago

@SinDongHwan I make some change in this resp,we could not save so many frames,then,we only save feature.(because the train only use feature.),at last,we make summary by the opencv frame. the reason is the work that saving frames cost experiense. https://github.com/Harryjun/pytorch-vsumm-reinforce

First make datastes python video_forward2.py --makedatasets --dataset data_our/data_h5/data1.h5 --video-dir data_video/data1/ --frm-dir data_our/frames Second make score and generate summary python3 video_forward2.py --makescore --model log/summe-split0/model_epoch1000.pth.tar --gpu 0 --dataset data_our/data_h5/data2.h5 --save-dir logs/videolog/ \ --summary --frm-dir data_our/frames

SinDongHwan commented 4 years ago

@Harryjun you're right. can't save so many frames. when i make dataset, i had tried to save all frames. i've faced to be slow. so, i removed a code line (save frame).

Harryjun commented 4 years ago

@SinDongHwan Hi,I find the checkpoint got by KTS is not same with the datasets author give.what the reason

SinDongHwan commented 4 years ago

@Harryjun Yes, not same. I've not solved about that. TT But result was not bad, when use Resnet.

Harryjun commented 4 years ago

@SinDongHwan Hi,I want to ask you some question,Recently, we are using a video keyframe extraction to do a work.so I make a test about DSN ,the I fond it can reglect some frames ,not very good. then ,I would like to ask you how to extract key frames for long videos.can you give me some suggestion. Thank you very much.

SinDongHwan commented 4 years ago

@Harryjun How long videos are? i think, you can get good results if you have proper change points. I've read many papers about video summarization. but I'm not video summary researcher. I'm just computer engineer. so i can't suggest nice idea.

I think you can have good results if you read many papers and think many about how to improve. Good Luck~!! You can do it!

Swati640 commented 4 years ago

@Harryjun @SinDongHwan my changepoint differs alot, from the actual H5 file. for example, in the actual H5 file for Video 1 change points have a difference of 100 frames. By KTS in "utils/generate_dataset.py" , gives me different results,so my network just select the starting frames for generating the video summary. Can you please help, how could I make changes in the change point

SinDongHwan commented 4 years ago

@Swati640 , @Harryjun I think you should ask the author of the paper or the creator of the dataset for help to get similar change points to the dataset.

Harryjun commented 4 years ago

@Swati640 @SinDongHwan you can send the author a email to get some suggestion,if you have some solution,please tell me ,thanks! And I think First,different net or params will make different changepoint. Second,the author first average the socre in every shot(changepoint[x,y]),then get the higher,I think we can make summary by get the higher frames of each two changepoints,for example,[0,23],[23,50]we can get a key frame in [0,23],],and select 0.15*23 nums.it will make you consider all shot. I think it,youcan try it .

Swati640 commented 4 years ago

@SinDongHwan @Harryjun if you have used GoogleNet for feature extraction. Please let me know , how did you guys do that for 1024 size feature extraction.

SinDongHwan commented 4 years ago

@Swati640 @Harryjun I've tried to extract features using GoogleNet. I just used codes from google search. (googlenet feature extract)

tell me your email. i will send you. but when i tried this, i got a bad results.

Swati640 commented 4 years ago

I tried as well, just getting the dimension same, rest very bad results for change point. I would like to compare my code with yours so it would be really helpful. My email id is s_sharma@rhrk.uni-kl.de. Thanks in advance :)

SinDongHwan commented 4 years ago

@Swati640 i sent you email. There is not GoogleNet in latest version of torchvision. So, you have to add and edit code while referring my email. Good Luck^^

harvestlamb commented 4 years ago

@SinDongHwan @Harryjun Hi~ Thanks for your codes,When i run your codes ,I encounter with some problems:

File " video_forward2.py", line 236, in module from utils.generate_dataset import Generate_Dataset ImportError: No module named generate_dataset

I don't know how to create a right dataset on myself video with your codes ,could you tell me some details? thank you again~

SinDongHwan commented 4 years ago

@harvestlamb Hi~!!

To get a dataset on your video, you should make following data.

  1. 'features' : feature of each 15th frame
  2. 'picks' : 15
  3. 'n_frames' : number of frames of a video
  4. 'fps' : frame per second
  5. 'change_points' : shot or scene change points.
    • to get change_points, you should use KTS.
    • Depending on which CNN you use, you will get different change_points. generate_datast
  6. 'n_frame_per_seg' : number of frame in interval of each change points.

if you want train using supervised learning, i think you should gound-truth( '0/1' about each 15th frames) and you have to make policy. because there is no right ground-truth in the summary.

harvestlamb commented 4 years ago

@SinDongHwan Thank you very much I try your code,and i have made myself dataset, and try to train it.and it create result.h5(only has reward , not has f-score).I encounter with this problems:

===> Evaluation
Traceback (most recent call last):
  File "video_summarization.py", line 224, in <module>
    main()
  File "video_summarization.py", line 129, in main
    evaluate(model, dataset, test_keys, use_gpu)
  File "video_summarization.py", line 167, in evaluate
    user_summary = dataset[key]['user_summary'][...]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/oliver/anaconda3/envs/PY2/lib/python2.7/site-packages/h5py/_hl/group.py", line 177, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'user_summary' doesn't exist)"

the failure of Evaluation, I think that maybe i don't label Grouth truth :''user_summary,gts_score and gtsummary in my dataset'',Should i label '0/1' about these indexs on my videos each 15th frames?could your give me some guidance on labeling these indexs on my video? (because i never label like this dataset) thank you again! Best wish to you~

SinDongHwan commented 4 years ago

@harvestlamb Hi. I missed 'user_summary' this need when evaluate. but don't need when test. 'user_summary' is data from n people.

i never label,too. you have to convert your video to frames. and then assign '0/1' all frames or 15th frames. you can refer summe dataset or tvsum dataset.

harvestlamb commented 4 years ago

@SinDongHwan thank you very much ,i analyse the summe data with your guidance video_1 : <HDF5 dataset "user_summary": shape (15, 4494), type "<f4"> video_10 : <HDF5 dataset "user_summary": shape (15, 9721), type "<f4"> video_11 : <HDF5 dataset "user_summary": shape (15, 1612), type "<f4"> video_12 : <HDF5 dataset "user_summary": shape (15, 950), type "<f4"> video_13 : <HDF5 dataset "user_summary": shape (15, 3187), type "<f4"> video_14 : <HDF5 dataset "user_summary": shape (15, 4608), type "<f4"> video_15 : <HDF5 dataset "user_summary": shape (17, 6096), type "<f4"> video_16 : <HDF5 dataset "user_summary": shape (15, 3065), type "<f4"> video_17 : <HDF5 dataset "user_summary": shape (15, 6683), type "<f4"> video_18 : <HDF5 dataset "user_summary": shape (17, 2221), type "<f4"> video_19 : <HDF5 dataset "user_summary": shape (17, 1751), type "<f4"> video_2 : <HDF5 dataset "user_summary": shape (18, 4729), type "<f4"> video_20 : <HDF5 dataset "user_summary": shape (17, 3863), type "<f4"> video_21 : <HDF5 dataset "user_summary": shape (15, 9672), type "<f4"> video_22 : <HDF5 dataset "user_summary": shape (15, 5178), type "<f4"> video_23 : <HDF5 dataset "user_summary": shape (15, 4382), type "<f4"> video_24 : <HDF5 dataset "user_summary": shape (15, 2574), type "<f4"> video_25 : <HDF5 dataset "user_summary": shape (16, 3120), type "<f4"> video_3 : <HDF5 dataset "user_summary": shape (15, 3341), type "<f4"> video_4 : <HDF5 dataset "user_summary": shape (15, 3064), type "<f4"> video_5 : <HDF5 dataset "user_summary": shape (15, 5131), type "<f4"> video_6 : <HDF5 dataset "user_summary": shape (16, 5075), type "<f4"> video_7 : <HDF5 dataset "user_summary": shape (15, 9046), type "<f4"> video_8 : <HDF5 dataset "user_summary": shape (17, 1286), type "<f4"> video_9 : <HDF5 dataset "user_summary": shape (15, 4971), type "<f4">

So user_summary's shape like (x,y), Obviosly,'y' represent n_frames ,Dose 'x' represent the number of people needed for labeling ? To facilitate labeling ,could i reduce some dimensionality or use similar labels in 15 dim?

SinDongHwan commented 4 years ago

@harvestlamb Hi, for example about video_1, 15 dim is number of people, and 4494 is number of frames. i think values of 4494 dim have {0/1}. It is not easy to label ground truth about all frame. i have a idea. first, pick ground truth of 15th frames on your dataset. second, if 30th frame is ground truth, label near 30th frame as "1". frames are not picked, label "0" value.

just my idea. i've not tried this. i think you should decide policy how to label.

harvestlamb commented 4 years ago

@SinDongHwan hello , thank you for your prompt reply,i think about your ideas is right ,and i already performed these ideas(but i don'l label gt_summary and gt_sorce , i think spervised learning need these two labels to train model,Is that right? ) , finally thank you again ~

SinDongHwan commented 4 years ago

@harvestlamb Hello, i don't know, too. so i think you ask to maker of dataset how to generate 'gt_summary, gt_score' . Good luck~!!^^

anaghazachariah commented 4 years ago

Hello..I implemented the project..You can refer my repo https://github.com/anaghazachariah/video_summary_generaton

huuuuyl commented 3 years ago

Please let me know , how did you guys do that for 1024 size feature extraction by ResNet152.

SinDongHwan commented 3 years ago

Hi, @huuuuyl

feature size of ResNet152 is 2048. So, i think you should change a input feature size 1024 to 2048 of video summarization model.

Good Luck.

huuuuyl commented 3 years ago

@SinDongHwan thanks for your kind advice.

mohammedshady commented 7 months ago

@Swati640 i sent you email. There is not GoogleNet in latest version of torchvision. So, you have to add and edit code while referring my email. Good Luck^^

Hey man i hope you are still here 😅 im having the same issue with change points that caused my f-score to drop drastically when i make my dataset using labels also the gtscore is not the same as the original dataset

here is my email : mohatech777@gmail.com