facebookresearch / hiera

Hiera: A fast, powerful, and simple hierarchical vision transformer.
Apache License 2.0
717 stars 36 forks source link

Changing the temporal dimension #23

Open MLDeS opened 9 months ago

MLDeS commented 9 months ago

I want to change the temporal dimension to ~6 and use only the first few Hiera blocks. I want to do it from a hub video model. Accordingly, I changed the Patch embed (Conv3D part) to have a less temporal stride instead of 4. After that, I only want 3 Hiera blocks. Following the changed Patch embed with Hiera.blocks[0:3] throws error. The inference way of return_immediates=True also says requiring a mask argument, which I assume is because a mask ratio is needed when requires_grad=True?

What would be the best way to do the above? Calling each Hiera block separately after a custom patch embed like above?