Closed DK-Jang closed 5 years ago
Thank you for looking into our paper:)
Sorry I forget to put the temporal clipping part in the code. You can try to add the following lines in the forward
function of agent.py
here:
temp_lens = [64, 56, 48, 40]
tlen = temp_lens[random.randint(0, 3)]
start = random.randint(0, 64 - tlen)
inputs = [x[:, :, start:start+tlen] for x in inputs]
targets = [x[:, :, start:start+tlen] for x in targets]
In general, this temporal clipping augmentation is not crucial.
Hi, since you only trained on such length sequences, is it possible to extract features from a very long input sequence? For example a one minute video which would have 1000+ frames. Thanks.
It's OK to use long sequence input, as it's a fully convolutional network. And notice that we use ReflectedPadding
for all conv layers, which we found beneficial to apply the system on long sequence. If using ZeroPadding
, indeed there will be artifacts when using long sequence.
Hello, I am very interested in you paper :) While looking at your code, I can't find the "temporal clipping" part (paper page 7) for data augmentation in your code. I want to know where is the implementation about " in every iteration we randomly select the temporal length from the set T ∈ {64, 56, 48, 40}"
Thank you.