extract_features does not produce features for all videos due to error in calculation of num_iterations

cdeepakroy commented 6 years ago

I was trying to extract video descriptors from my videos using tools/extract_features.py.

I found the number of features in the output pickle file to be less than the number of videos in the lmdb file created using data/create_video_db.py.

After a bit of debugging, I suspect there is an error in the way num_terations is calculated here in the the ExtractFeatures function.

examples_per_iteration = args.batch_size * num_gpus
num_iterations = int(num_examples / examples_per_iteration)

For example let us say there are 84 videos in the lmdb file, and we use a batch_size=4 and num_gpus=2 then the current code shown above will set num_iterations = int(84 / (4 * 2)) = 10 and the output pickle file will contain only num_iterations * examples_per_iteration = 10 * 8 = 80 feature vectors instead of 84.

To rectify this, num_iterations should be calculated as follows:

num_iterations = int(np.ceil(float(num_examples) / examples_per_iteration))

This will set num_iterations to 11 instead of 10 and produce 88 feature vectors. I am trying to figure out how to ignore the 4 extra/dummy feature vectors it produces potentially based on the video_id

cdeepakroy commented 6 years ago

I just tried running the extract_features.py script with the correction proposed above on an lmdb file with 70 videos with batch_size=16 and num_gpus=2 and I got the following for video_id in the pickle file

[35 61 36 50 48  7 21 38 39 37 66 47 53 52 41 63 17 20 40 51 19  3  1 26
 23 59 14 29 69 46 15 34 28 56 58 64 24  0  4 45 65  8 30 12 43 62 31 55
 16 33 68 11 22 18 25 27  6 57  2 54  9 60 44  5 14 29 69 46 15 34 35 61
 36 50 48  7 21 38 39 37 42 67 13 49 32 10 17 20 40 51 19  3  1 26 23 59]

Since num_iterations=3 it seems to have duplicated the first 26 video_id values.

Anyone has a clue what is happening here?

dutran commented 6 years ago

A simple fix is to use 1 more iteration (as you suggested), then ignore the redundancy features. Note that one video can have multiple clips, so to have a correct alignment, you may want to use video_id as clip_id.

cdeepakroy commented 6 years ago

@dutran Thanks for the reply.

I compared the feature vectors with the same/redundant/duplicate clip_id aka video_id and they are not the same.

Also, i noticed that the feature vectors are slightly different for each run. Is this expected? Is there anything random in the feature extraction process? When it does a crop after resizing, does it compute the central crop (of size 112 x 112) or does it do a random crop in each call?

dutran commented 5 years ago

@cdeepakroy try this https://github.com/facebookresearch/R2Plus1D/pull/40

facebookresearch / VMZ

extract_features does not produce features for all videos due to error in calculation of num_iterations #30