e-apostolidis / PGL-SUM

A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization" (IEEE ISM 2021)
Other
81 stars 32 forks source link

My video has very long duration. Is it possible to inference? #6

Closed o0oooo0o closed 2 years ago

o0oooo0o commented 2 years ago

Hi, thanks for sharing your work!

I'm impressed with your model. So I want to generate the summary result with the drama video that I have. This video has about an hour 30 minutes duration. It is about 3~5 gigabytes in size.

I already tried to test this on this site (http://multimedia2.iti.gr/videosummarization/service/start.html ), but it failed because of size issue. So, do you think if I download and inference your model in person, can I get the summary result of my video successfully? (I know that the dataset used in the your experiment is about 5 minutes long.)

mpalaourg commented 2 years ago

Hi @o0oooo0o , we have not experimented with so long videos, neither on training nor on inference stage of our model. However, I can't see any reasons why you can't summarize your custom video(s). All of the needed logic is in inference.py, where you'll have to create your own h5 file (instead of the given h5 files for the TVSum and SumMe dataset) for the custom video(s) you want to inference and compute a summary.

Firstly, to calculate change_points you must segment your video(s) into shots. Currently, the videos of the used datasets have been segmented using the KTS algorithm, but it's not necessary to use the same method. The n_frames is the number of frames in the video. Furthermore, in our scipts we down-sampled every video to 2fps for calculation effiency, and so picks contains the positions of the sampled frames in the original video. Finally, features contains the feature vectors of the sampled video frames, extracted by a pretrained model of GoogleNet. Lastly, you don't need evaluate_summary, but to return the bounds of the selected shots after running generate_summary.

A pseudocode to produce the video summary, by using those bounds is the following:

for video_name in requested_videos:
    video_file_name = f"{video_dir}/{video_name}.mp4"
    try:
        cap = cv2.VideoCapture(video_file_name)
        fps = cap.get(cv2.CAP_PROP_FPS)
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

        count = 0
        frames = np.zeros((total_frames, height, width, 3), dtype="uint8")
        while cap.isOpened():
            ret, frame = cap.read()     # Gather frame segments in BGR
            if not ret:
                break
            frames[count] = frame
            count = count + 1
        cap.release()
    except Exception as e:
        raise Exception('Can\'t load video {}\n{}'.format(video_file_name, repr(e)))

    summary_frames = []
    for selected_shot in all_segments[idx]:
        curr_summary_frames = frames[selected_shot[0]:selected_shot[1] + 1, :]
        summary_frames.append(curr_summary_frames)
    summary_frames = np.concatenate(summary_frames)

        writer = cv2.VideoWriter(f"{out_path}/{video_name}.mp4", cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))
    for frame in summary_frames:
        writer.write(frame)
    writer.release()

More info about the h5 file properties and shapes, can be found in the README file.

o0oooo0o commented 2 years ago

Thank you! I will try it and contact you again if I have another question :)

o0oooo0o commented 2 years ago

Hi I succeeded in getting the result. Thanks to your kind comment :) Your model summarized our one-hour drama into about 10 minutes.

However, we have not prepared the user summary yet, so we are not able to evaluate. But it will be done soon.

mpalaourg commented 2 years ago

Hi @o0oooo0o, I am really glad that our model helped you to summarize your drama video! With the above pseudocode you can visualize the summary to see a qualitative result, w/o having a user summary in hand.

PS. We have released the code of our next paper, entitled "Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames", and you can find the here. If you want, you can produce another summary (the steps are the same) and compare the results.