Changing the temporal dimension

I want to change the temporal dimension to ~6 and use only the first few Hiera blocks. I want to do it from a hub video model. Accordingly, I changed the Patch embed (Conv3D part) to have a less temporal stride instead of 4. After that, I only want 3 Hiera blocks. Following the changed Patch embed with Hiera.blocks[0:3] throws error. The inference way of return_immediates=True also says requiring a mask argument, which I assume is because a mask ratio is needed when requires_grad=True?

What would be the best way to do the above? Calling each Hiera block separately after a custom patch embed like above?

facebookresearch / hiera

Changing the temporal dimension #23