I have a question about 64f in the dataloader of the code.

Tushar-N / pytorch-resnet3d

I3D Nonlocal ResNets in Pytorch

245 stars 39 forks source link

I have a question about 64f in the dataloader of the code. #7

Closed seominseok0429 closed 5 years ago

seominseok0429 commented 5 years ago

Hi . I am a university student studying in South Korea. I'm confused about the code, so I leave a question. What method did you use to load the video from the data loader when you trained? For example, if you want to use 64f, If the frame is 200 frames, do you draw 64 f at regular intervals? Or is there any other way?

Tushar-N commented 5 years ago

L69 in the dataloader has the details.

if self.split=='train': # random sample
    offset = np.random.randint(0, len(imgs)-self.clip_len)
    imgs = imgs[offset:offset+self.clip_len]
elif self.split=='val': # center crop
    offset = len(imgs)//2 - self.clip_len//2
    imgs = imgs[offset:offset+self.clip_len]

For training, draw 64 consecutive frames starting at a random location. For validation, take the center 64 frames. For multi-crop testing, draw 64 frames at regular intervals. See the corresponding sample() function.

seominseok0429 commented 5 years ago

Thank you very much for your kind reply. Can I ask you an i3d question unrelated to the code?

Tushar-N commented 5 years ago

Can I ask you an i3d question unrelated to the code?

Sure, I can try to help.

seominseok0429 commented 5 years ago

I think it's good to draw 64 frames uniformly from the video. Why did you pick 64 consecutive frames starting from a random position?

Tushar-N commented 5 years ago

The 3D conv operates over time as well. Uniformly sampling 64 frames would result in a non-uniform temporal stride depending on the video length. In 2D video models (TSN for example), frame-wise predictions are pooled, and here, other sampling strategies make sense.