KaiyangZhou / vsumm-reinforce

AAAI 2018 - Unsupervised video summarization with deep reinforcement learning (Theano)
MIT License
137 stars 36 forks source link

TypeError: 'KeysView' object does not support indexing #3

Closed vinaysworld closed 6 years ago

vinaysworld commented 6 years ago

Hi, Can you help me in this line, what modification should i do? Traceback (most recent call last): File "vsum_train.py", line 155, in train_dataset_path=args.dataset) File "vsum_train.py", line 83, in train key = dataset_keys[index] TypeError: 'KeysView' object does not support indexing

KaiyangZhou commented 6 years ago

Hi, I think you are using python3 so dictionary.keys() could not be indexed. What you can do is to replace line63 by

#dataset_keys = dataset.keys()
dataset_keys = list(dataset.keys())

Thanks for spotting this. I have updated this in the scripts. Let me know if you have further issues.

vinaysworld commented 6 years ago

Hi, I have completed all the process and got the log file which saying final outcome. but as i studied in your paper, you had used video 18 in TVSum dataset for generating summary or importance score, can i also do this. As I am student and new one in neural networks, can you give the way, how I will give the video as input and got the important frames or summarized frames, and all the necessary graphs. Thank you

KaiyangZhou commented 6 years ago

If you want to get the raw output of the summarization network (i.e. importance scores, Fig.3 in our paper), you would need to save the output of probs = net.model_inference(data_x) in this line (e.g. to a new h5 file).

To get the summarized frames (Fig.2 in our paper), you can save machine_summary = vsum_tools.generate_summary(probs, cps, n_frames, nfps, positions) (this line). machine_summary is a binary vector indicating which frames are included in the summary.

Hope this would help.

AdarshMJ commented 6 years ago

Im not sure I understand when you said save machine_summary = vsum_tools.generate_summary(probs, cps, n_frames, nfps, positions).

Is there a provision to give one video file as input via command line? Could you please elaborate this

To get the summarized frames (Fig.2 in our paper), you can save machine_summary = vsum_tools.generate_summary(probs, cps, n_frames, nfps, positions) (this line). machine_summary is a binary vector indicating which frames are included in the summary.

KaiyangZhou commented 6 years ago

@AdarshMJ For example, the input video has 5 frames, machine_summary = [0, 0, 1, 1, 0] where positions with value 1 mean these frames are keyframes (summary). You need to manually pick those frames (in this example, frame3 and frame4).

AdarshMJ commented 6 years ago

Thank you so much. I got it. I wanted to know whether those indices actually represent the frame numbers or they are just indices?

KaiyangZhou commented 6 years ago

@AdarshMJ The values represent whether the frames are selected. To find those indices, you can do sth like summary.nonzero(). I have just updated the code to include the visualization tool, so you can visualize the score-vs-gtscore.

AdarshMJ commented 6 years ago

Thank you so much. I will check out. I wanted to know how to create the h5 file for my own dataset? I checked out the links and readme.txt. But there are so many parameters that has to be included for the h5 file to be created, like gt scores and all that. is it possible to update your code for generating this data? That would be helpful.

On 21 Apr 2018, 8:20 PM +0530, Kaiyang notifications@github.com, wrote:

@AdarshMJ The values represent whether the frames are selected. To find those indices, you can do sth like summary.nonzero(). I have just updated the code to include the visualization tool, so you can visualize the score-vs-gtscore. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

KaiyangZhou commented 6 years ago

It might be unnecessary to do this because the way you generate the image features and ground truth score/summary would be different and dependent on your purpose. You can follow this after you got those data.

AdarshMJ commented 6 years ago

Okay got it.

f.create_dataset(name + '/features', data=data_of_name) f.create_dataset(name + '/gtscore', data=data_of_name) f.create_dataset(name + '/user_summary', data=data_of_name) f.create_dataset(name + '/change_points', data=data_of_name) f.create_dataset(name + '/n_frame_per_seg', data=data_of_name) f.create_dataset(name + '/n_frames', data=data_of_name) f.create_dataset(name + '/picks', data=data_of_name) f.create_dataset(name + '/n_steps', data=data_of_name) f.create_dataset(name + '/gtsummary', data=data_of_name) f.create_dataset(name + '/video_name', data=data_of_name) Of all these parameters can i just have video name, number of frames and features as part of the data? Or all the parameters are necessary? Which parameters are necessary to be included?

On 21 Apr 2018, 8:44 PM +0530, Kaiyang notifications@github.com, wrote:

It might be unnecessary to do this because the way you generate the image features and ground truth score/summary would be different and dependent on your purpose. You can follow this after you got those data. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

KaiyangZhou commented 6 years ago

If you just wanna train policy network, you will need features only. Pls double check the vsum_train

AdarshMJ commented 6 years ago

I checked out your vsum_train.py code. In these lines

    for index in indices:
        key = dataset_keys[index]
        data_x = dataset[key]['features'][...].astype(_DTYPE)
        L_distance_mat = cdist(data_x, data_x, 'euclidean')
        L_dissim_mat = 1 - np.dot(data_x, data_x.T)
        if ignore_distant_sim:
            inds = np.arange(data_x.shape[0])[:,None]
            inds_dist = cdist(inds, inds, 'minkowski', 1)
            L_dissim_mat[inds_dist > distant_sim_thre] = 1
        rewards = net.model_train(data_x, learn_rate, L_dissim_mat, L_distance_mat, blrwds[key])
        blrwds[key] = 0.9 * blrwds[key] + 0.1 * rewards.mean()
        epoch_reward += rewards.mean()

This means the training is done using only the features which have been extracted from the videos right? It is not taking into account the rest of the parameters like gtscore, user_summary and all that?

KaiyangZhou commented 6 years ago

@AdarshMJ Yes, only the features are needed for training.